<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0"><channel><title>tripleo-specs</title><link>https://specs.openstack.org/openstack/tripleo-specs</link><description /><language>en</language><copyright>OpenStack Foundation</copyright><item><title>Decouple TripleO Tasks</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/zed/decouple-tripleo-tasks.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/decouple-tripleo-tasks"&gt;https://blueprints.launchpad.net/tripleo/+spec/decouple-tripleo-tasks&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec proposes decoupling tasks across TripleO by organizing tasks in a way
that they are grouped as a function of what they manage. The desire is to be
able to better isolate and minimize what tasks need to be run for specific
management operations. The process of decoupling tasks is implemented through
moving tasks into standalone native ansible roles and playbooks in tripleo-ansible.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;TripleO presently manages the entire software configuration of the overcloud at
once each time &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;deploy&lt;/span&gt;&lt;/code&gt; is executed. Regardless of
whether nodes were already deployed, require a full redeploy for some reason,
or are new nodes (scale up, replacement) all tasks are executed. The
functionality of only executing needed tasks lies within Ansible.&lt;/p&gt;
&lt;p&gt;The problem with relying entirely on Ansible to determine if any changes are
needed is that it results in long deploy times. Even if nothing needs to be
done, it can take hours just to have Ansible check each task in order to make
that determination.&lt;/p&gt;
&lt;p&gt;Additionally, TripleO’s reliance on external tooling (Puppet, container config
scripts, bootstrap scripts, etc) means that tasks executing those tools
&lt;strong&gt;must&lt;/strong&gt; be executed by Ansible as Ansible does not have the necessary data
needed in order to determine if those tasks need to be executed or not. These
tasks often have cascading effects in determining what other tasks need to be
run. This is a general problem across TripleO, and is why the model of just
executing all tasks on each deploy has been the accepted pattern.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;The spec proposes decoupling tasks and separating them out as needed to manage
different functionality within TripleO. Depending on the desired management
operation, tripleoclient will contain the necessary functionality to trigger
the right tasks. Decoupling and refactoring tasks will be done by migrating to
standalone ansible role and playbooks within tripleo-ansible. This will allow
for reusing the standalone ansible artifacts from tripleo-ansible to be used
natively with just &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ansible-playbook&lt;/span&gt;&lt;/code&gt;. At the same time, the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-heat-templates&lt;/span&gt;&lt;/code&gt; interfaces are maintained by consuming the new roles
and playbooks from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-ansible&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;There are 3 main changes proposed to implement this spec:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Refactor ansible tasks from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-heat-templates&lt;/span&gt;&lt;/code&gt; into standalone roles
in tripleo-ansible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Develop standalone playbooks within tripleo-ansible to consume the
tripleo-ansible roles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update tripleo-heat-templates to use the standalone roles and playbooks from
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-ansible&lt;/span&gt;&lt;/code&gt; with new &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;role_data&lt;/span&gt;&lt;/code&gt; interfaces to drive specific
functionality with new &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt;&lt;/code&gt; commands.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Writing standalone roles in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-ansible&lt;/span&gt;&lt;/code&gt; will largely be an exercise of
copy/paste from tasks lists in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-heat-templates&lt;/span&gt;&lt;/code&gt;. As tasks are moved
into standalone roles, tripleo-heat-templates can be directly updated to run
tasks from the those roles using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;include_role&lt;/span&gt;&lt;/code&gt;. This pattern is already well
established in tripleo-heat-templates with composable services that use
existing standalone roles.&lt;/p&gt;
&lt;p&gt;New playbooks will be developed within tripleo-ansible to drive the standalone
roles using pure &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ansible-playbook&lt;/span&gt;&lt;/code&gt;. These playbooks will offer a native
ansible experience for deploying with tripleo-ansible.&lt;/p&gt;
&lt;p&gt;The design principles behind the standalone role and playbooks are:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Native execution with ansible-playbook, an inventory, and variable files.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No Heat. While Heat remains part of the TripleO architecture, it has no
bearing on how the native ansible is developed in tripleo-ansible.
tripleo-heat-templates can consume the standalone ansible playbooks and
roles from tripleo-ansible, but it does not dictate the interface. The
interface should be defined for native ansible best practices.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No puppet. As the standalone roles are developed, they will not rely on
puppet for configuration or any other tasks. To allow integration with
tripleo-heat-templates and existing TripleO interfaces (Hiera, Heat
parameters), the roles will allow skipping config generation and other parts
that use puppet so that pieces can be overridden by
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-heat-templates&lt;/span&gt;&lt;/code&gt; specific tasks. When using native Ansible,
templated config files and native ansible tasks will be used instead of
Puppet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;While the decoupled tasks will allow for cleaner interfaces for executing
just specific management operations, all tasks will remain idempotent. A
full deployment that re-runs all tasks will still work, and result in no
effective changes for an already deployed cloud with the same set of inputs.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The standalone roles will use separated task files for each decoupled
management interface exposed. The playbooks will be separated by management
interface as well to allow for executing just specific management functionality.&lt;/p&gt;
&lt;p&gt;The decoupled management interfaces are defined as:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;bootstrap&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;install&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;pre-network&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;network&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;configure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;container-config&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;service-bootstrap&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;New task interfaces in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-heat-templates&lt;/span&gt;&lt;/code&gt; will be added under
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;role_data&lt;/span&gt;&lt;/code&gt; to correspond with the new management interfaces, and consume the
standalone ansible from tripleo-ansible. This will allow executing just
specific management interfaces and using the standalone playbooks from
tripleo-ansible directly.&lt;/p&gt;
&lt;p&gt;New subcommands will be added to tripleoclient to trigger the new management
interface operations, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;install&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt;
&lt;span class="pre"&gt;configure&lt;/span&gt;&lt;/code&gt;, etc.&lt;/p&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;deploy&lt;/span&gt;&lt;/code&gt; would continue to function as it presently does
by doing a full assert of the system state with all tasks. The underlying
playbook, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;deploy-steps-playbook.yaml&lt;/span&gt;&lt;/code&gt; would be updated as necessary to
include the other playbooks so that all tasks can be executed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;dl class="field-list simple"&gt;
&lt;dt class="field-odd"&gt;Alternative 1 - Use –tags/–skip-tags&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-odd"&gt;&lt;p/&gt;&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;With &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--tags&lt;/span&gt;&lt;/code&gt; / &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--skip-tags&lt;/span&gt;&lt;/code&gt;, tasks could be selectively executed. In the
past this has posed other problems within TripleO. Using tags does not allow
for composing tasks to the level needed, and often results in running tasks
when not needed or forgetting to tag needed tasks. Having to add the special
cased &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;always&lt;/span&gt;&lt;/code&gt; tag becomes necessary so that certain tasks are run when
needed. The tags become difficult to maintain as it is not apparent what tasks
are tagged when looking at the entire execution. Additionally, not all
operations within TripleO map to Ansible tasks one to one. Container startup
are declared in a custom YAML format, and that format is then used as input to
a task. It is not possible to tag individual container startups unless tag
handling logic was added to the custom modules used for container startup.&lt;/p&gt;
&lt;dl class="field-list simple"&gt;
&lt;dt class="field-odd"&gt;Alternative 2 - Use –start-at-task&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-odd"&gt;&lt;p/&gt;&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--start-at-task&lt;/span&gt;&lt;/code&gt; is likewise problematic, and it does not truly
partition the full set of tasks. Tasks would need to be reordered anyway across
much of TripleO so that &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--start-at-task&lt;/span&gt;&lt;/code&gt; would work. It would be more
straightforward to separate by playbook if a significant number of tasks need
to be reordered.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Special consideration should be given to security related tasks to ensure that
the critical tasks are executed when needed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;Upgrade and update tasks are already separated out into their own playbooks.
There is an understanding that the full &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;deploy_steps_playbook.yaml&lt;/span&gt;&lt;/code&gt; is
executed after an update or upgrade however. This full set of tasks could end
up being reduced if tasks are sufficiently decoupled in order to run the
necessary pieces in isolation (config, bootstrap, etc).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Users will need to be aware of the limitations of using the new management
commands and playbooks. The expectation within TripleO has always been the
entire state of the system is re-asserted on scale up and configure operations.&lt;/p&gt;
&lt;p&gt;While the ability to still do a full assert would be present, it would no
longer be required. Operators and users will need to understand that only
running certain management operations may not fully apply a desired change. If
only a reconfiguration is done, it may not imply restarting containers. With
the move to standalone and native ansible components, with less
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;config-download&lt;/span&gt;&lt;/code&gt; based generation, it should be more obvious what each
playbooks is responsible for managing. The native ansible interfaces will help
operators reason about what needs to be run and when.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Performance should be improved for the affected management operations due to
having to run less tasks, and being able to run only the tasks needed for a
given operation.&lt;/p&gt;
&lt;p&gt;There should be no impact when running all tasks. Tasks must be refactored in
such a way that the overall deploy process when all tasks are run is not made
slower.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Discuss things that will affect how you deploy and configure OpenStack
that have not already been mentioned, such as:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;What config options are being added? Should they be more generic than
proposed (for example a flag that other hypervisor drivers might want to
implement as well)? Are the default values ones which will work well in
real deployments?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Is this a change that takes immediate effect after its merged, or is it
something that has to be explicitly enabled?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;TripleO developers will be responsible for updating the service templates that
they maintain in order to refactor the tasks.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;James Slagle &amp;lt;&lt;a class="reference external" href="mailto:jslagle%40redhat.com"&gt;jslagle&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Work items or tasks – break the feature up into the things that need to be
done to implement it. Those parts might end up being done by different people,
but we’re mostly trying to understand the timeline for implementation.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Existing CI jobs would cover changes to task refactorings.
New CI jobs could be added for the new isolated management operations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;New commands and playbooks must be documented.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/q/topic:standalone-roles"&gt;standalone-roles POC&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 22 Mar 2022 00:00:00 </pubDate></item><item><title>Zed placeholder</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/zed/placeholder.html</link><description>
 
</description><pubDate>Tue, 22 Mar 2022 00:00:00 </pubDate></item><item><title>Mixed Operating System Versions</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/mixed-operating-system-versions.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/mixed-operating-system-versions"&gt;https://blueprints.launchpad.net/tripleo/+spec/mixed-operating-system-versions&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec proposes that a single TripleO release supports multiple operating
system versions.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Historically a single branch or release of TripleO has supported only a single
version of an operating system at a time. In the past, this has been specific
versions of Ubuntu or Fedora in the very early days, and now has standardized
on specific versions of CentOS Stream.&lt;/p&gt;
&lt;p&gt;In order to upgrade to a later version of OpenStack, it involves first
upgrading the TripleO undercloud, and then upgrading the TripleO overcloud to
the later version of OpenStack. The problem with supporting only a single
operating system version at a time is that such an OpenStack upgrade typically
implies an upgrade of the operating system at the same time. Combining the
OpenStack upgrade with a simultaneous operating system upgrade is problematic
due to:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Upgrade complexity&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upgrade time resulting in extended maintenance windows&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operating system incompatibilities with running workloads (kernel, libvirt,
KVM, qemu, OVS/OVN, etc).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;User impact of operating system changes (docker vs. podman, network-scripts
vs. NetworkManager, etc).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;This spec proposes that a release of TripleO support 2 major versions of an
operating system, particularly CentOS Stream. A single release of TripleO
supporting two major versions of CentOS Stream will allow for an OpenStack
upgrade while remaining on the same operating version.&lt;/p&gt;
&lt;p&gt;There are multiple software versions in play during an OpenStack upgrade:&lt;/p&gt;
&lt;dl class="field-list simple"&gt;
&lt;dt class="field-odd"&gt;TripleO&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-odd"&gt;&lt;p&gt;The TripleO version is the version of the TripleO related packages installed
on the undercloud. While some other OpenStack software versions are used here
(Ironic, Neutron, etc), for the purposes of this spec, all TripleO and
OpenStack software on the undercloud will be referred to as the TripleO
version. The TripleO version corresponds to an OpenStack version.
Examples: Train, Wallaby, Zed.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class="field-even"&gt;OpenStack&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-even"&gt;&lt;p&gt;The OpenStack version is the version of OpenStack on the overcloud that is
being managed by the TripleO undercloud.
Examples: Train, Wallaby, Zed.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class="field-odd"&gt;Operating System&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-odd"&gt;&lt;p&gt;The operating system version is the version of CentOS Stream. Both the
undercloud and overcloud have operating system versions. The undercloud and
the overcloud may not have the same operating system version, and all nodes
in the overcloud may not have the same operating system version.
Examples: CentOS Stream 8, 9, 10&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class="field-even"&gt;Container Image&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-even"&gt;&lt;p&gt;The container image version is the version of the base container image used
by tcib. This is a version of the Red Hat universal base image (UBI).
Examples: UBI 8, 9, 10&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;For the purposes of this spec, the operating system versions being discussed
will be CentOS Stream 8 and 9, while the OpenStack versions will be Train and
Wallaby. However, the expectation is that TripleO continues to support 2
operating system versions with each release going forward. Subsequently, the
Zed. release of TripleO would support CentOS Stream 9 and 10.&lt;/p&gt;
&lt;p&gt;With the above version definitions and considerations in mind, a TripleO
managed upgrade from Train to Wallaby would be described as the following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Upgrade the undercloud operating system version from CentOS Stream 8 to 9.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upgrade the undercloud TripleO version from Train to Wallaby.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The Wallaby version of the TripleO undercloud will only run on CentOS Stream
9.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implies upgrading all TripleO and OpenStack software on the undercloud to
Wallaby.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upgrade the OpenStack version on the overcloud from Train to Wallaby&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Does not imply upgrading the operating system version from CentOS Stream 8
to 9.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implies upgrading to new container image versions that are the images for
OpenStack Wallaby. These container image versions will likely be service
dependent. Some services may use UBI version 9, while some may remain on UBI
version 8.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upgrade the operating system version on the overcloud nodes from CentOS
Stream 8 to 9.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Can happen node by node, with given constraints that might include all
control plane nodes need to be upgraded at the same time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data plane nodes could be selectively upgraded.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The default behavior will be that users and operators can choose to upgrade to
CentOS Stream 9 separately from the OpenStack upgrade. For those operators who
want a combined OpenStack and operating system upgrade to match previous FFU
behavior, they can perform both upgrades back to back. The OpenStack and
operating system upgrades will be separate processes. There may be UX around
making the processes appear as one, but that is not prescribed by this spec.&lt;/p&gt;
&lt;p&gt;New TripleO deployments can choose either CentOS Stream 8 or 9 for their
Overcloud operating system version.&lt;/p&gt;
&lt;p&gt;The implication with such a change is that the TripleO software needs to know
how to manage OpenStack on different operating system versions. Ansible roles,
puppet modules, shell scripts, etc, all need to remove any assumptions about a
given operating system and be developed to manage both CentOS Stream 8 and 9.
This includes operating system utilities that may function quite differently
depending on the underlying version, such as podman and container-tools.&lt;/p&gt;
&lt;p&gt;CentOS Stream 8 support could not be dropped until the Zed. release of TripleO,
at which time, support would be needed for CentOS Stream 9 and 10.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;dl class="field-list simple"&gt;
&lt;dt class="field-odd"&gt;Alternative 1&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-odd"&gt;&lt;p&gt;The TripleO undercloud Wallaby version could support running on both CentOS
Stream 8 and 9. There does not seem to be much benefit in supporting both.
Some users may refuse to introduce 9 into their environments at all, but
TripleO has not encountered similar resistance in the past.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class="field-even"&gt;Alternative 2&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-even"&gt;&lt;p&gt;When upgrading the overcloud to the OpenStack Wallaby version, it could be
required that all control plane nodes go through an operating system upgrade
as well. Superficially, this appears to reduce the complexity of the
development and test matrix. However, given the nature of composable roles,
this requirement would really need to be prescribed per-service, and not
per-role. Enforcing such a requirement would be problematic given the
flexibility of running any service on any role. It would instead be better
that TripleO document what roles need to be upgraded to a newer operating
system version at the same time, by documenting a set of already provided
roles or services. E.g., all nodes running a pacemaker managed service need
to be upgraded to the same operating system version at the same time.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class="field-odd"&gt;Alternative 3&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-odd"&gt;&lt;p&gt;A single container image version could be used for all of OpenStack Wallaby. In
order to support running those containers on both CentOS Stream 8 and 9, the
single UBI container image would likely need to be 8, as anticipated support
statements may preclude support for running UBI 9 images on 8.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt class="field-even"&gt;Alternative 4&lt;span class="colon"&gt;:&lt;/span&gt;&lt;/dt&gt;
&lt;dd class="field-even"&gt;&lt;p&gt;New deployments could be forced to use CentOS Stream 9 only for their
overcloud operating system version. However, some users may have workloads
that have technical or certification requirements that could require CentOS
Stream 8.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;This proposal is meant to improve the FFU process by separating the OpenStack
and operating system upgrades.&lt;/p&gt;
&lt;p&gt;Most users and operators will welcome this change. Some may prefer the old
method which offered a more simultaneous and intertwined upgrade. While the new
process could be implemented in such a way to offer a similar simultaneous
experience, it will still be different and likely appear as 2 distinct steps.&lt;/p&gt;
&lt;p&gt;Distinct steps should result in an overall simplification of the upgrade
process.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The previous implementations of FFU had the OpenStack and operating system
upgrades intertwined in the way that they were performed.  With the separation
of the upgrade processes, the overall upgrade of both OpenStack and the
operating system may take a longer amount of time overall. Operators would need
to plan for longer maintenance windows in the cases where they still want to
upgrade both during the same windows.&lt;/p&gt;
&lt;p&gt;Otherwise, operators can choose to upgrade just OpenStack first, and then the
operating system at a later date, resulting in multiple, but shorter,
maintenance windows.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;TripleO developers will need support managing OpenStack software across
multiple operating system versions.&lt;/p&gt;
&lt;p&gt;Service developers responsible for TripleO integrations, will need to decide
upgrade requirements around their individual services when it comes to
container image versions and supporting different operating system versions.&lt;/p&gt;
&lt;p&gt;Given that the roll out of CentOS Stream 9 support in TripleO has happened in a
way that overlaps with supporting 8, it is largely true today that TripleO
Wallaby already supports both 8 and 9. CI jobs exist that test Wallaby on both
8 and 9. Going forward, that needs to remain true.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&amp;lt;launchpad-id or None&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&amp;lt;launchpad-id or None&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;tripleo-ansible - CentOS Stream 8 and 9 support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates - CentOS Stream 8 and 9 support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;puppet-tripleo - CentOS Stream 8 and 9 support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;puppet-* - CentOS Stream 8 and 9 support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tcib - build right container image versions per service&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CentOS Stream 9 builds will be required to fully test and develop&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;FFU is not typically tested in upstream CI. However, CI will be needed that
tests deploying OpenStack Wallaby on both CentOS Stream 8
and 9 in order to verify that TripleO Wallaby is compatible with both operating
system versions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The matrix of supported versions will need to be documented within
tripleo-docs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 15 Mar 2022 00:00:00 </pubDate></item><item><title>TripleO Ceph Ingress Daemon Integration</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/yoga/tripleo_ceph_ingress.html</link><description>
 
&lt;p&gt;Starting in the Octopus release, Ceph introduced  its own day1 tool called
cephadm and its own day2 tool called orchestrator which replaced ceph-ansible.
During the Wallaby and Xena cycles TripleO moved away from ceph-ansible and
adopted cephadm &lt;a class="footnote-reference brackets" href="#id19" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; as described in &lt;a class="footnote-reference brackets" href="#id20" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
During Xena cycle a new approach of deploying Ceph in a TripleO context has
been established and now a Ceph cluster can be provisioned before the overcloud
is created, leaving to the overcloud deployment phase the final configuration
of the Ceph cluster which depends on the OpenStack enabled services defined by
the tripleo-heat-templates interface.
The next goal is to deploy as many Ceph services as possible using the deployed
ceph interface instead of during overcloud deployment.
As part of this effort, we should pay attention to the high-availability aspect,
how it’s implemented in the current release and how it should be changed for
Ceph.
This spec represents a follow up of &lt;a class="footnote-reference brackets" href="#id21" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, it defines the requirements to rely
on the Ceph provided HA daemons and describes the changes required in TripleO
to meet this goal.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;In the following description we are referring to the Ganesha daemon and the
need of the related Ceph Ingress daemon deployment, but the same applies to
all the existing daemons that requires an high-availability configuration
(e.g., RGW and the Ceph dashboard for the next Ceph release).
In TripleO we support deployment of Ganesha both when the Ceph cluster is
itself managed by TripleO and when the Ceph cluster is itself not managed by
TripleO.
When the cluster is managed by TripleO, as per spec &lt;a class="footnote-reference brackets" href="#id21" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, it is preferable to
have cephadm manage the lifecycle of the NFS container instead of deploying it
with tripleo-ansible, and this is broadly covered and solved by allowing the
tripleo Ceph mkspec module to support the new Ceph daemon &lt;a class="footnote-reference brackets" href="#id22" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
The ceph-nfs daemon deployed by cephadm has its own HA mechanism, called
ingress, which is based on haproxy and keepalived &lt;a class="footnote-reference brackets" href="#id23" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; so we would no longer
use pcmk as the VIP owner.
This means we would run pcmk and keepalived in addition to haproxy (deployed by
tripleo) and another haproxy (deployed by cephadm) on the same server (though
with listeners on different ports).
This approach only relies on Ceph components, and both external and internal
scenarios are covered.
However, adopting the ingress daemon for a TripleO deployed Ceph cluster means
that we need to make the overcloud aware about the new running services: for
this reason the proposed change is meant to introduce a new TripleO resource
that properly handles the interface with the Ceph services and is consistent
with the tripleo-heat-templates roles.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The change proposed by this spec requires the introduction of a new TripleO
Ceph Ingress resource that describes the ingress service that provides load
balancing and HA.
The impact of adding a new &lt;cite&gt;OS::TripleO::Services::CephIngress&lt;/cite&gt; resource can
be seen on the following projects.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="tripleo-common"&gt;
&lt;h3&gt;tripleo-common&lt;/h3&gt;
&lt;p&gt;As described in Container Image Preparation &lt;a class="footnote-reference brackets" href="#id24" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; the undercloud may be used as
a container registry for all the ceph related containers and a new, supported
syntax, has been introduced to &lt;cite&gt;deployed ceph&lt;/cite&gt; to download containers from
authenticated registries.
However, as per &lt;a class="footnote-reference brackets" href="#id25" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, the Ceph ingress daemons won’t be baked into the Ceph
daemon container, hence &lt;cite&gt;tripleo container image prepare&lt;/cite&gt; should be executed to
pull the new container images/tags in the undercloud as made for the Ceph
Dashboard and the regular Ceph image.
Once the ingress containers are available, it’s possible to deploy the daemon
on top of ceph-nfs or ceph-rgw.
In particular, if this spec is going to be implemented, &lt;cite&gt;deployed ceph&lt;/cite&gt; will be
the only way of setting up this daemon through cephadm for ceph-nfs, resulting
in a simplified tripleo-heat-templates interface and a less number of tripleo
ansible tasks execution because part of the configuration is moved before the
overcloud is deployed.
As part of this effort, considering that the Ceph related container images have
grown over the time, a new condition will be added to the tripleo-container jinja
template &lt;a class="footnote-reference brackets" href="#id26" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; to avoid pulling additional ceph images if Ceph is not deployed by
TripleO &lt;a class="footnote-reference brackets" href="#id28" id="id10" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;10&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
This will result in a new optimization for all the Ceph external cluster use cases,
as well as the existing CI jobs without Ceph.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="tripleo-heat-templates"&gt;
&lt;h3&gt;tripleo-heat-templates&lt;/h3&gt;
&lt;p&gt;A Heat resource will be created within the cephadm space. The new resource will
be also added to the existing Controller roles and all the relevant environment
files will be updated with the new reference.
In addition, as described in the spec &lt;a class="footnote-reference brackets" href="#id21" id="id11" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, pacemaker constraints for ceph-nfs
and the related vip will be removed.
The tripleo-common ceph_spec library is already able to generate the spec for
this kind of daemon and it will trigger cephadm &lt;a class="footnote-reference brackets" href="#id22" id="id12" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; to deploy an ingress daemon
provided that the NFS Ceph spec is applied against an existing cluster and the
backend daemon is up and running.
As mentioned before, the ingress daemon can also be deployed on top of an RGW
instance, therefore the proposed change is valid for all the Ceph services that
require an HA configuration.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The ingress daemon applied to an existing ceph-nfs instance is managed by
cephadm, resulting in a simplified model in terms of lifecycle. A Ceph spec for
the ingress daemon is generated right after the ceph-nfs instance is applied,
and as per &lt;a class="footnote-reference brackets" href="#id23" id="id13" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; it requires two additional options:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;frontend_port&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;monitoring_port&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The two ports are required by haproxy to accept incoming requests and for
monitoring purposes, hence we need to make TripleO aware about this new service
and properly setup the firewall rules. As long as the ports defined by the spec
are passed to the overcloud deployment process and defined in the
tripleo-heat-templates CephIngress daemon resource, the &lt;cite&gt;firewall_rules&lt;/cite&gt;
tripleo ansible role is run and rules are applied for both the frontend and
monitoring port. The usual network used by this daemon (and affected by the new
applied rules) is the &lt;cite&gt;StorageNFS&lt;/cite&gt;, but we might have cases where an operator
overrides it.
The lifecycle, builds and security aspects for the container images associated
to the CephIngress resource are not managed by TripleO, and the Ceph
organization takes care about maintanance and updates.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;The problem of an existing Ceph cluster is covered by the spec &lt;a class="footnote-reference brackets" href="#id26" id="id14" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Since two new images (and the equivalent tripleo-heat-templates services) have
been introduced, some time is required to pull these new additional containers
in the undercloud. However, the tripleo_containers jinja template has been
updated, splitting off the Ceph related container images. In particular, during
the containers image prepare phase, a new boolean option has been added and
pulling the Ceph images can be avoided by setting the &lt;cite&gt;ceph_images&lt;/cite&gt; boolean to
false. By doing this we can improve performances when Ceph is not required.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;This effort can be easily extended to move the RGW service to deployed ceph,
which is out of scope of this spec.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="deployment-flow"&gt;
&lt;h3&gt;Deployment Flow&lt;/h3&gt;
&lt;p&gt;The deployment and configuration described in this spec will happen during
&lt;cite&gt;openstack overcloud ceph deploy&lt;/cite&gt;, as described in &lt;a class="footnote-reference brackets" href="#id26" id="id15" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
The current implementation of &lt;cite&gt;openstack overcloud network vip provision&lt;/cite&gt;
allows to provision 1 vip per network, which means that using the new Ceph
Ingress daemon (that requires 1 vip per service) can break components that
are still using the VIP provisioned on the storage network (or any other
network depending on the tripleo-heat-templates override specified) and
are managed by pacemaker.
A new option &lt;cite&gt;–ceph-vip&lt;/cite&gt; for &lt;cite&gt;openstack overcloud ceph deploy&lt;/cite&gt; command
will be added &lt;a class="footnote-reference brackets" href="#id29" id="id16" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;11&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. This option may be used to reserve VIP(s) for each
Ceph service specified by the ‘service/network’ mapping defined as input.
For instance, a generic ceph service mapping can be something like the
following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="o"&gt;---&lt;/span&gt;
&lt;span class="n"&gt;ceph_services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ceph_nfs&lt;/span&gt;
    &lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ceph_rgw&lt;/span&gt;
    &lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For each service added to the list above, a virtual ip on the specified
network (that can be a composable network) will be created and used as
frontend_vip of the ingress daemon.
As described in the overview section, an ingress object will be defined
and deployed and this is supposed to manage both the VIP and the HA for
this component.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;fmount&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;fultonj&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gfidente&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create a new Ceph prefixed Heat resource that describes the Ingress daemon
in the TripleO context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add both haproxy and keepalived containers to the Ceph container list so that
they can be pulled during the &lt;cite&gt;Container Image preparation&lt;/cite&gt; phase.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a set of tasks to deploy both the nfs and the related ingress
daemon&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deprecate the pacemaker related configuration for ceph-nfs, including
pacemaker constraints between the manila-share service and ceph-nfs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create upgrade playbooks to transition from TripleO/pcmk managed nfs
ganesha to nfs/ingress daemons deployed by cephadm and managed by ceph
orch&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Depending on the state of the directord/task-core migration we might skip the
ansible part, though we could POC with it to get started, extending the existing
tripleo-ansible cephadm role.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;This work depends on the tripleo_ceph_nfs spec &lt;a class="footnote-reference brackets" href="#id21" id="id17" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; that moves from tripleo
deployed ganesha to the cephadm approach.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The NFS daemon feature can be enabled at day1 and it will be tested against
the existing TripleO scenario004 &lt;a class="footnote-reference brackets" href="#id27" id="id18" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
As part of the implementation plan, the update of the existing heat templates
environment CI files, which contain both the Heat resources and the testing
job parameters, is one of the goals of this spec.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The documentation will describe the new parameters introduced to the &lt;cite&gt;deployed
ceph&lt;/cite&gt; cli to give the ability to deploy additional daemons (ceph-nfs and the
related ingress daemon) as part of deployed ceph.
However, we should provide upgrade instructions for pre existing environments
that need to transition from TripleO/pcmk managed nfs ganesha to nfs daemons
deployed by cephadm and managed by ceph orch.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id19" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/ceph/ceph/tree/master/src/cephadm"&gt;cephadm&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id20" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ceph.html"&gt;tripleo-ceph&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id21" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id3"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id4"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id11"&gt;3&lt;/a&gt;,&lt;a role="doc-backlink" href="#id17"&gt;4&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/yoga/tripleo_ceph_manila.html"&gt;tripleo-nfs-spec&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id22" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id5"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id12"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-ansible/+/818786"&gt;tripleo-ceph-mkspec&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id23" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id6"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id13"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.ceph.com/en/pacific/cephadm/nfs/#high-availability-nfs"&gt;cephadm-nfs-ingress&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id24" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;6&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html"&gt;container-image-preparation&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id25" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id8"&gt;7&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L55-L56"&gt;ceph-ingress-containers&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id26" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id9"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id14"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id15"&gt;3&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-common/blob/master/container-images/tripleo_containers.yaml.j2"&gt;tripleo-common-j2&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id27" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id18"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/master/ci/environments/scenario004-standalone.yaml"&gt;tripleo-scenario004&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id28" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id10"&gt;10&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-common/+/824431"&gt;tripleo-common-split-off&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id29" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id16"&gt;11&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/q/topic:%22ceph_vip_provision%22+(status:open%20OR%20status:merged)"&gt;tripleo-ceph-vip&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Tue, 11 Jan 2022 00:00:00 </pubDate></item><item><title>TripleO Ceph Ganesha Integration for Manila</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/yoga/tripleo_ceph_manila.html</link><description>
 
&lt;p&gt;Starting in the Octopus release, Ceph introduced  its own day1 tool called
cephadm and its own day2 tool called orchestrator which replaced ceph-ansible.
During the Wallaby and Xena cycles TripleO moved away from ceph-ansible and
adopted cephadm &lt;a class="footnote-reference brackets" href="#id16" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; as described in &lt;a class="footnote-reference brackets" href="#id17" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
However, the ganesha deamon deployment remained under the tripleo-ansible
control, with a set of tasks that are supposed to replicate the relevant part
of the ceph-nfs ceph-ansible role &lt;a class="footnote-reference brackets" href="#id18" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
This choice ensured backward compatibility with the older releases.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;In TripleO we support deployment of Ganesha both when the Ceph cluster is
itself managed by TripleO and when the Ceph cluster is itself not managed
by TripleO.
When the cluster is managed by TripleO, an NFS daemon can be deployed as a
regular TripleO service via the tripleo-ansible module &lt;a class="footnote-reference brackets" href="#id19" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
It is preferable to have cephadm manage the lifecycle of the NFS container
instead of deploying it with tripleo-ansible.
In order to do this we will require the following changes on both TripleO
and Manila:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;the orchestrator provides an interface that should be used by Manila to
interact with the ganesha instances. The nfs orchestrator interface is
described in &lt;a class="footnote-reference brackets" href="#id20" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and can be used to manipulate the nfs daemon, as well
as create and delete exports.
In the past the ganesha configuration file was fully customized by
ceph-ansible; the orchestrator is going to have a set of overrides to
preserve backwards compatibility. This result is achieved by setting a
userconfig object that lives within the Ceph cluster &lt;a class="footnote-reference brackets" href="#id20" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. It’s going
to be possible to check, change and reset the nfs daemon config using
the same interface provided by the orchestrator &lt;a class="footnote-reference brackets" href="#id26" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;11&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The deployed NFS daemon is based on the watch_url mechanism &lt;a class="footnote-reference brackets" href="#id21" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;:
adopting a cephadm deployed ganesha instance requires the Manila driver
be updated to support this new approach. This work is described in &lt;a class="footnote-reference brackets" href="#id25" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;10&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The ceph-nfs daemon deployed by cephadm has its own HA mechanism, called
ingress, which is based on haproxy and keepalived &lt;a class="footnote-reference brackets" href="#id22" id="id10" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; so we would no
longer use pcmk as the VIP owner.
Note this means we would run pcmk and keepalived in addition to haproxy
(deployed by tripleo) and another haproxy (deployed by cephadm) on the
same server (though with listeners on different ports).
Because cephadm is controlling the ganesha life cycle, the pcs cli will
no longer be used to interact with the ganesha daemon and we will change
where the ingress daemon is used.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When the Ceph cluster is &lt;em&gt;not&lt;/em&gt; managed by TripleO, the Ganesha service is
currently deployed standalone on the overcloud and it’s configured to use
the external Ceph MON and MDS daemons.
However, if this spec is implemented, then the standalone ganesha service
will no longer be deployed by TripleO. Instead, we will require that the
admin of the external ceph cluster add the ceph-nfs service to that cluster.
Though TripleO will still configure Manila to use that service.&lt;/p&gt;
&lt;p&gt;Thus in the external case, Ganesha won’t be deployed and details about the
external Ganesha must be provided as input during overcloud deployment. We
will also provide tools to help someone who has deployed Ganesaha on the
overcloud transition the service to their external Ceph cluster. From a high
level the process will be the following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Generate a cephadm spec so that after the external ceph cluster becomes
managed by cephadm the spec can be used to add a the ceph-nfs service
with the required properties.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Disable the VIP PCS uses and provide a documented method for it to be
moved to the external ceph cluster.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;An ansible task will generate the Ceph NFS daemon spec and it will trigger
cephadm &lt;a class="footnote-reference brackets" href="#id17" id="id11" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; to deploy the Ganesha container.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;the NFS spec should be rendered and applied against the existing Ceph
cluster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the ingress spec should be rendered (as part of the NFS deployment)
and applied against the cluster&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The container will be no longer controlled by pacemaker.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None, the same code which TripleO would already use for the generation of
the Ceph cluster config and keyrings will be consumed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We will deprecate the ganesha managed by PCS so that it will still work
up until Z.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We will provide playbooks which migrate from the old NFS service to the
new one.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We will assume these playbooks will be available in Z and run prior to
the upgrade to the next release.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;For fresh deployments, the existing input parameters will be reused to
drive the newer deployment tool.
For an existing environment, after the Ceph upgrade, the TripleO deployed
NFS instance will be stopped and removed by the migration playbook provided,
as well as the related pacemaker resources and constraints; cephadm will
be able to deploy and manage the new NFS instances, and the end user will
see a disruption in the NFS service.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;No changes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;“deployed ceph”: For the first implementation of this spec we’ll deploy
during overcloud deployment but we will aim to deliver this so that it
is compatible with “deployed ceph”. VIPs are provisioned with
&lt;cite&gt;openstack overcloud network vip provision&lt;/cite&gt; before
&lt;cite&gt;openstack overcloud network provision&lt;/cite&gt; and before
&lt;cite&gt;openstack overcloud node provision&lt;/cite&gt; so we would have an ingress VIP in
advance so we could do this with “deployed ceph”.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;directord/task-core: We will ultimately need this implemented for the
directord/task-core tool but could start with ansible tasks added to
the tripleo_ceph role. Depending on the state of the directord/task-core
migration when we implement we might skip the ansible part, though we
could POC with it to get started.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Assuming the manila services are able to interact with Ganesha using the
watch_url mechanism, the NFS daemon can be generated as a regular Ceph
daemon using the spec approach provided by the tripleo-ansible module &lt;a class="footnote-reference brackets" href="#id19" id="id12" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="deployment-flow"&gt;
&lt;h3&gt;Deployment Flow&lt;/h3&gt;
&lt;p&gt;The deployment and configuration described in this spec will happen during
&lt;cite&gt;openstack overcloud deploy&lt;/cite&gt;, as described in &lt;a class="footnote-reference brackets" href="#id23" id="id13" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
This is consistent with how tripleo-ansible used to run during step2 to
configure these services. The tripleo-ansible tasks should be moved from a
pure ansible templating approach that generates the systemd unit according
to the input provided to a cephadm based daemon that can be configured with
the usual Ceph mgr config-key mechanism.
As described in the overview section, an ingress object will be defined and
deployed and this is supposed to manage both the VIP and the HA for this
component.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;fmount&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;fultonj&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gfidente&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Change the tripleo-ansible module to support the Ceph ingress daemon
type&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a set of tasks to deploy both the nfs and the related ingress
daemons&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deprecate the pacemaker related configuration for ceph-nfs, including
pacemaker constraints between the manila-share service and ceph-nfs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create upgrade playbooks to transition from TripleO/pcmk managed nfs
ganesha to nfs daemons deployed by cephadm and managed by ceph orch&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This work depends on the manila spec &lt;a class="footnote-reference brackets" href="#id25" id="id14" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;10&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; that moves from dbus to the
watch_url approach&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The NFS daemon feature can be enabled at day1 and it will be tested against
the existing TripleO scenario004 &lt;a class="footnote-reference brackets" href="#id24" id="id15" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
As part of the implementation plan, the update of the existing heat templates
environment CI files, which contain the testing job parameters, is one of the
goals of this spec.
An important aspect of the job definition process is related to standalone vs
multinode.
As seen in the past, multinode can help catching issues that are not visible
in a standalone environment, but of course the job configuration can be improved
in the next cycles, and we can start with standalone testing, which is what is
present today in CI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;No changes should be necessary to the TripleO documentation, as the described
interface remains the unchanged.
However, we should provide upgrade instructions for pre existing environments
that need to transition from TripleO/pcmk managed nfs ganesha to nfs daemons
deployed by cephadm and managed by ceph orch.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id16" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/ceph/ceph/tree/master/src/cephadm"&gt;cephadm&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id17" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id2"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id11"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ceph.html"&gt;tripleo-ceph&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id18" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_cephadm/tasks/ganesha"&gt;tripleo-ceph-ganesha&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id19" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id4"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id12"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/ansible_plugins/modules/ceph_mkspec.py"&gt;tripleo-ceph-mkspec&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id20" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id5"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id6"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephfs/fs-nfs-exports"&gt;tripleo-ceph-nfs&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id21" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id8"&gt;6&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf#L206"&gt;ganesha-watch_url&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id22" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id10"&gt;7&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.ceph.com/en/pacific/cephadm/nfs/#high-availability-nfs"&gt;cephadm-nfs-ingress&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id23" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id13"&gt;8&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/cephadm.html"&gt;tripleo-cephadm&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id24" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id15"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/master/ci/environments/scenario004-standalone.yaml"&gt;tripleo-scenario004&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id25" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;10&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id9"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id14"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/manila/+spec/cephfs-nfs-drop-dbus"&gt;cephfs-nfs-drop-dbus&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id26" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;11&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/ceph/ceph/pull/43504"&gt;cephfs-get-config&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Tue, 12 Oct 2021 00:00:00 </pubDate></item><item><title>Moving TripleO repos to independent release model</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/xena/tripleo-independent-release.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo"&gt;https://blueprints.launchpad.net/tripleo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec proposes that we move all tripleo repos to the independent release
model. The proposal was first raised during tripleo irc meetings &lt;a class="footnote-reference brackets" href="#id10" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and then
also on the openstack-discuss mailing list &lt;a class="footnote-reference brackets" href="#id11" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The TripleO repos &lt;a class="footnote-reference brackets" href="#id12" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; mostly follow the cycle-with-intermediary release
model, for example tripleo-heat-templates at &lt;a class="footnote-reference brackets" href="#id13" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. Mostly because some of
tripleo repos use the independent release model, for example tripleo-upgrade
at &lt;a class="footnote-reference brackets" href="#id14" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. A description of the different release models can be found at &lt;a class="footnote-reference brackets" href="#id15" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;By following the cycle-with-intermediary release model, TripleO is bound to
produce a release for each OpenStack development cycle and a corresponding
stable/branch in the tripleo repos. However as we have seen this causes an
ongoing maintenance burden; consider that currently TripleO supports 5
active branches - Train, Ussuri, Victoria, Wallaby and Xena (current master).
In fact until very recently that list contained 7 branches, including Stein
and Queens (currently moving to End Of Life &lt;a class="footnote-reference brackets" href="#id16" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;This creates an ongoing maintenance and resource burden where for each
branch we are backporting changes, implementing, running and maintaining
upstream CI and ensuring compatibility with the rest of OpenStack with 3rd
party CI and the component and integration promotion pipelines &lt;a class="footnote-reference brackets" href="#id17" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, on an
ongoing bases.&lt;/p&gt;
&lt;p&gt;Finally, changes in the underlying OS between branches means that for some
branches we maintain two “types” of CI job; for stable/train we have to support
both Centos 7 and Centos 8. With the coming stable/xena, we would likely have
to support Centos-Stream-8 as well as Centos-Stream-9 in the event that
Stream-9 is not fully available by the xena release, which further compounds
the resource burden. By adopting the proposal laid out here we can choose to
skip the Xena branch thus avoiding this increased CI and maintenance cost.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The proposal is for all TripleO repos that are currently using the
cycle-with-intermediary release model to switch to independent. This will
allow us to choose to skip a particular release and more importantly skip
the creation of the given stable/branch on those repos.&lt;/p&gt;
&lt;p&gt;This would allow the TripleO community to focus our resources on those branches
that are most ‘important’ to us, namely the ‘FFU branches’. That is, the
branches that are part of the TripleO Fast Forward Upgrade chain (currently
these are Train -&amp;gt; Wallaby -&amp;gt; Z?). For example it is highly likely that we
would not create a Xena branch.&lt;/p&gt;
&lt;p&gt;Developers will be freed from having to backport changes across stable/branches
and this will have a dramatic effect on our upstream CI resource consumption
and maintenance burden.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We can continue to create all the stable/branches and use the same release
model we currently have. This would mean we would continue to have an increased
maintenance burden and would have to address that with increased resources.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;For upgrades it would mean that TripleO would no longer directly support all
OpenStack stable branches. So if we decide not to create stable/xena for example
then you cannot upgrade from wallaby to xena using TripleO. In some respects
this would more closely match reality since the focus of the active tripleo
developer community has typically been on ensuring the Fast Forward Upgrade
(e.g. train to wallaby) and less so on ensuring the point to point upgrade
between 2 branches.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;TripleO would no longer be able to deploy all versions of OpenStack. One idea
that was brough forth in the discussions around this topic thus far, is that
we can attempt to address this by designating a range of git tags as compatible
with a particular OpenStack stable branch.&lt;/p&gt;
&lt;p&gt;For example if TripleO doesn’t create a stable/xena, but during the xena cycle
makes releases for the various Tripleo repos then &lt;em&gt;those&lt;/em&gt; releases will be
compatible for deploying OpenStack stable/xena. We can maintain and publicise
a set of compatible tags for each of the affected repos (e.g.,
tripleo-heat-templates versions 15.0.0 to 15.999.999 are compatible with
OpenStack stable/xena).&lt;/p&gt;
&lt;p&gt;Some rules around tagging will help us. Generally we can keep doing what we
currently do with respect to tagging; For major.minor.patch (e.g. 15.1.1) in
the release tag, we will always bump major to signal a new stable branch.&lt;/p&gt;
&lt;p&gt;One problem with this solution is that there is no place to backport fixes to.
For example if you are using tripleo-heat-templates 15.99.99 to deploy
OpenStack Xena (and there is no stable/xena for tht) then you’d have to apply
any fixes to the top of the 15.99.99 tag and use it. There would be no way
to commit these fixes into the code repo.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;There were concerns raised in the openstack-discuss thread [2] about RDO
packaging and how it would be affected by this proposal. As was discussed
there are no plans for RDO to stop building packages for any branch. For the
building of tripleo repos we would have to rely on the latest compatible
git tag, as outlined above in &lt;a class="reference internal" href="#other-end-user-impact"&gt;Other End User Impact&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Will have less stable/branches to backport fixes to. It is important to note
however that by skipping some branches, resulting backports across multiple
branches will result in a larger code diff and so be harder for developers to
implement. That is, there will be increased complexity in resulting backports if
we skip intermediate branches.&lt;/p&gt;
&lt;p&gt;As noted in the &lt;a class="reference internal" href="#other-end-user-impact"&gt;Other End User Impact&lt;/a&gt; section above, for those branches that
tripleo decides not to create, there will be no place for developers to commit
any branch specific fixes to. They can consume particular tagged releases of
TripleO repos that are compatible with the given branch, but will not be able
to commit those changes to the upstream repo since the branch will not exist.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Wesley Hayutin &amp;lt;&lt;a class="reference external" href="mailto:weshay%40redhat.com"&gt;weshay&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;
Marios Andreou &amp;lt;&lt;a class="reference external" href="mailto:marios%40redhat.com"&gt;marios&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Besides posting the review against the releases repo &lt;a class="footnote-reference brackets" href="#id18" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; we will need to
update documentation to reflect and inform about this change.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Yes we will at least need to add some section to the docs to explain this.
We may also add some landing page to show the currently ‘active’ or supported
TripleO branches.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id10" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://meetings.opendev.org/meetings/tripleo/2021/tripleo.2021-05-25-14.00.html"&gt;Tripleo IRC meeting logs 25 May 2021&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id11" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-discuss/2021-June/thread.html#22959"&gt;openstack-discuss thread ‘[tripleo] Changing TripleO’s release model’&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id12" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/governance/src/commit/8dcac06d702ccff89b19c73b0c1d5ae7620b9a7b/reference/projects.yaml#L3044-L3177"&gt;TripleO section in governance projects.yaml&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id13" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id4"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/releases/src/commit/e1b3fa10962cefad3220ae41e1c81a0ae0fd0fd5/deliverables/wallaby/tripleo-heat-templates.yaml#L3"&gt;tripleo-heat-templates wallaby release file&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id14" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;5&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/releases/src/commit/e1b3fa10962cefad3220ae41e1c81a0ae0fd0fd5/deliverables/_independent/tripleo-upgrade.yaml"&gt;tripleo-upgrade independent release file&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id15" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id6"&gt;6&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://releases.openstack.org/reference/release_models.html"&gt;OpenStack project release models&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id16" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;7&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-discuss/2021-June/023409.html"&gt;openstack-discuss [TripleO] moving stable/stein and stable/queens to End of Life&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id17" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id8"&gt;8&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/tripleo-docs/latest/ci/stages-overview.html"&gt;TripleO Docs - TripleO CI Promotions&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id18" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id9"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/releases/"&gt;opendev.org openstack/releases git repo&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Tue, 20 Jul 2021 00:00:00 </pubDate></item><item><title>CI Team Structure</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/policy/ci-team-structure.html</link><description>
 
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The soft analysis over the past one to two years is that landing major new
features and function in CI is difficult while being interrupted by a constant
stream of issues.  Each individual is siloed in their own work, feature or
section of the production chain and there is very little time for thoughtful
peer review and collaborative development.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="policy"&gt;
&lt;h2&gt;Policy&lt;/h2&gt;
&lt;section id="goals"&gt;
&lt;h3&gt;Goals&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Increase developer focus, decrease distractions, interruptions, and time
slicing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Encourage collaborative team development.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better and faster code reviews&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="team-structure"&gt;
&lt;h3&gt;Team Structure&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The Ruck&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Rover&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Sprint Team&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="the-ruck"&gt;
&lt;h3&gt;The Ruck&lt;/h3&gt;
&lt;p&gt;One person per week will be on the front lines reporting failures found in CI.
The Ruck &amp;amp; Rover switch roles in the second week of the sprint.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Primary focus is to watch CI, report bugs, improve debug documentation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Does not participate in the sprint&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Attends the meetings where the team needs to be represented&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Responds to pings on  #oooq / #tripleo regarding CI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reviews and improves documentation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Attends meetings for the group where possible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For identification, use the irc nick $user|ruck&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="the-rover"&gt;
&lt;h3&gt;The Rover&lt;/h3&gt;
&lt;p&gt;The primary backup for the Ruck.  The Ruck should be catching all the issues
in CI and passing the issues to the Rover for more in depth analysis or
resolution of the bug.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Back up for the Ruck&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Workload is driven from the tripleo-quickstart bug queue, the Rover is
not monitoring CI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A secondary input for work is identified technical debt defined in the
Trello board.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Attends the sprint meetings, but is not responsible for any sprint work&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Helps to triage incoming gerrit reviews&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Responds to pings on irc #oooq / #tripleo&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the Ruck is overwhelmed with any of their responsibilities the
Rover is the primary backup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For identification, use the irc nick $user|rover&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="the-sprint-team"&gt;
&lt;h3&gt;The Sprint Team&lt;/h3&gt;
&lt;p&gt;The team is defined at the beginning of the sprint based on availability.
Members on the team should be as focused on the sprint epic as possible.
A member of team should spend 80% of their time on sprint goals and 20%
on any other duties like code review or incoming high priority bugs that
the Rover can not manage alone.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;hand off interruptions to the Ruck and Rover as much as possible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;focus as a team on the sprint epic&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;collaborate with other members of the sprint team&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;seek out peer review regarding sprint work&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;keep the Trello board updated daily&lt;/dt&gt;&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;&lt;p&gt;One can point to Trello cards in stand up meetings for status&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="the-squads"&gt;
&lt;h3&gt;The Squads&lt;/h3&gt;
&lt;p&gt;The squads operate as a subunit of the sprint team.  Each squad will operate
with the same process and procedures and are managed by the team catalyst.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;Current Squads&lt;/dt&gt;&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;CI&lt;/dt&gt;&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Responsible for the TripleO CI system ( non-infra ) and build
verification.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;Tempest&lt;/dt&gt;&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Responsible for tempest development.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="team-leaders"&gt;
&lt;h2&gt;Team Leaders&lt;/h2&gt;
&lt;section id="the-team-catalyst-tc"&gt;
&lt;h3&gt;The team catalyst (TC)&lt;/h3&gt;
&lt;p&gt;The member of the team responsible organizing the group. The team will elect or
appoint a team catalyst per release.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;organize and plan sprint meetings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;collect status and send status emails&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="the-user-advocate-ua"&gt;
&lt;h3&gt;The user advocate (UA)&lt;/h3&gt;
&lt;p&gt;The member of the team responsible for help to prioritize work.  The team will
elect or appoint a user advocate per release.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;organize and prioritize the Trello board for the sprint planning&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;monitor the board during the sprint.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ensure the right work is being done.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="id1"&gt;
&lt;h3&gt;The Squads&lt;/h3&gt;
&lt;p&gt;There are two squads on the CI team.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;tripleo ci&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tempest development&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Each squad has a UA and they share a TC. Both contribute to Ruck and Rover rotations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="current-leaders-for-rocky"&gt;
&lt;h3&gt;Current Leaders for Rocky&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;team catalyst (ci, tempest) - Matt Young&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;user advocate (ci)          - Gabriele Cerami&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;user advocate (tempest)     - Chandan Kumar&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="sprint-structure"&gt;
&lt;h3&gt;Sprint Structure&lt;/h3&gt;
&lt;p&gt;The goal of the sprint is to define a narrow and focused feature called an epic
to work on in a collaborative way.  Work not completed in the sprint will be
added to the technical debt column of Trello.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Each sprint needs a clear definition of done that is documented in
the epic used for the sprint.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="sprint-start-day-1-2-5-hours"&gt;
&lt;h3&gt;Sprint Start ( Day 1 ) - 2.5 hours&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Sprints are three weeks in length&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A planning meeting is attended by the entire team including the Ruck and
Rover&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Review PTO&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Review any meetings that need to be covered by the Ruck/Rover&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The UA will present options for the sprint epic&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Discuss the epic, lightly breaking each one down&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Vote on an epic&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The vote can be done using a doodle form&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Break down the sprint epic into cards&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;Review each card&lt;/dt&gt;&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Each card must have a clear definition of done&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As a group include as much detail in the card as to provide enough
information for an engineer with little to no background with the task.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="sprint-end-day-15-2-5-hours"&gt;
&lt;h3&gt;Sprint End ( Day 15 ) - 2.5 hours&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;Retrospective&lt;/dt&gt;&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;&lt;p&gt;team members, ruck and rover only&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document any technical debt left over from the sprint&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ruck / Rover hand off&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Assign Ruck and Rover positions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sprint demo - when available&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Office hours on irc&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="scrum-meetings-30-min"&gt;
&lt;h3&gt;Scrum meetings - 30 Min&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Planning meeting, video conference&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sprint End, video and irc #oooq on freenode&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;2 live video conference meetings per week&lt;/dt&gt;&lt;dd&gt;&lt;ul&gt;
&lt;li&gt;&lt;p&gt;sprint stand up&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other days, post status to the team’s Trello board and/or cards&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="tripleoo-ci-community-meeting"&gt;
&lt;h3&gt;TripleoO CI Community meeting&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A community meeting should be held once a week.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The meeting should ideally be conveniently scheduled immediately after
the TripleO community meeting on #tripleo (OFTC)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The CI meeting should be announced as part of the TripleO community meeting
to encourage participation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives-history"&gt;
&lt;h2&gt;Alternatives &amp;amp; History&lt;/h2&gt;
&lt;p&gt;In the past the CI team has worked as individuals or by pairing up for distinct
parts of the CI system and for certain features.  Neither has been
overwhelmingly successful for delivering features on a regular cadence.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Primary author: Wes Hayutin weshayutin at gmail&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Ronelle Landy rlandy at redhat&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Arx Cruz acruz at redhat&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sagi Shnaidman at redhat&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="milestones"&gt;
&lt;h2&gt;Milestones&lt;/h2&gt;
&lt;p&gt;This document is likely to evolve from the feedback discussed in sprint
retrospectives.  An in depth retrospective should be done at the end of each
upstream cycle.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;section id="trello"&gt;
&lt;h3&gt;Trello&lt;/h3&gt;
&lt;p&gt;A Trello board will be used to organize work. The team is expected to keep the
board and their cards updated on a daily basis.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://trello.com/b/U1ITy0cu/tripleo-ci-squad"&gt;https://trello.com/b/U1ITy0cu/tripleo-ci-squad&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="dashboards"&gt;
&lt;h3&gt;Dashboards&lt;/h3&gt;
&lt;p&gt;A number of dashboards are used to monitor the CI&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://cistatus.tripleo.org/"&gt;http://cistatus.tripleo.org/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://dashboards.rdoproject.org/rdo-dev"&gt;https://dashboards.rdoproject.org/rdo-dev&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://zuul-status.tripleo.org/"&gt;http://zuul-status.tripleo.org/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="team-notes"&gt;
&lt;h3&gt;Team Notes&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-ci-squad-meeting"&gt;https://etherpad.openstack.org/p/tripleo-ci-squad-meeting&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="bug-queue"&gt;
&lt;h3&gt;Bug Queue&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://tinyurl.com/yag6y9ne"&gt;http://tinyurl.com/yag6y9ne&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="revision-history"&gt;
&lt;h2&gt;Revision History&lt;/h2&gt;
&lt;table class="docutils align-default" id="id2"&gt;
&lt;caption&gt;&lt;span class="caption-text"&gt;Revisions&lt;/span&gt;&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Release Name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;Rocky&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;April 16 2018&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0
Unported License. &lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Tue, 06 Jul 2021 00:00:00 </pubDate></item><item><title>Patch Abandonment</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/policy/patch-abandonment.html</link><description>
 
&lt;section id="goal"&gt;
&lt;h2&gt;Goal&lt;/h2&gt;
&lt;p&gt;Provide basic policy that core reviewers can apply to outstanding reviews. As
always, it is up to the core reviewers discretion on whether a patch should or
should not be abandoned. This policy is just a baseline with some basic rules.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;TripleO consists of many different projects in which many patches become stale
or simply forgotten. This can lead to problems when trying to review the
current patches for a given project.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="when-to-abandon"&gt;
&lt;h2&gt;When to Abandon&lt;/h2&gt;
&lt;p&gt;If a proposed patch has been marked -1 WIP by the author but has sat idle for
more than 180 days, a core reviewer should abandon the change with a reference
to this policy.&lt;/p&gt;
&lt;p&gt;If a proposed patch is submitted and given a -2 and the patch has sat idle for
90 days with no effort to address the -2, a core reviewer should abandon the
change with a reference to this policy.&lt;/p&gt;
&lt;p&gt;If a proposed patch becomes stale by ending up with a -1 from CI for 90 days
and no activity to resolve the issues, a core reviewer should abandon the
change with a reference to this policy.&lt;/p&gt;
&lt;p&gt;If a proposed patch with no activity for 90 days is in merge conflict, even
with a +1 from CI, a core reviewer should abandon the change with a reference
to this policy.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="when-not-to-abandon"&gt;
&lt;h2&gt;When NOT to Abandon&lt;/h2&gt;
&lt;p&gt;If a proposed patch has no feedback but is +1 from CI, a core reviewer should
not abandon such changes.&lt;/p&gt;
&lt;p&gt;If a proposed patch a given a -1 by a reviewer but the patch is +1 from CI and
not in merge conflict and the author becomes unresponsive for a few weeks,
reviewers can leave a reminder comment on the review to see if there is
still interest in the patch.  If the issues are trivial then anyone should feel
welcome to checkout the change and resubmit it using the same change ID to
preserve original authorship. Core reviewers should not abandon such changes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="restoration"&gt;
&lt;h2&gt;Restoration&lt;/h2&gt;
&lt;p&gt;Feel free to restore your own patches. If a change has been abandoned
by a core reviewer, anyone can request the restoration of the patch by
asking a core reviewer on IRC in #tripleo on OFTC or by sending a
request to the openstack-dev mailing list. Should the patch again
become stale it may be abandoned again.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternative-history"&gt;
&lt;h2&gt;Alternative &amp;amp; History&lt;/h2&gt;
&lt;p&gt;This topic was previously brought up on the openstack mailing list &lt;a class="footnote-reference brackets" href="#id4" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; along
with proposed code to use for automated abandonment &lt;a class="footnote-reference brackets" href="#id5" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. Similar policies are
used by the Puppet OpenStack group &lt;a class="footnote-reference brackets" href="#id6" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="author-s"&gt;
&lt;h3&gt;Author(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary author:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;aschultz&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bnemec&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="milestones"&gt;
&lt;h3&gt;Milestones&lt;/h3&gt;
&lt;p&gt;Pike-2&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id4" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2015-October/076666.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2015-October/076666.html&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id5" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/cybertron/tripleo-auto-abandon"&gt;https://github.com/cybertron/tripleo-auto-abandon&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id6" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/developer/puppet-openstack-guide/reviews.html#abandonment"&gt;https://docs.openstack.org/developer/puppet-openstack-guide/reviews.html#abandonment&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="revision-history"&gt;
&lt;h2&gt;Revision History&lt;/h2&gt;
&lt;table class="docutils align-default" id="id7"&gt;
&lt;caption&gt;&lt;span class="caption-text"&gt;Revisions&lt;/span&gt;&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Release Name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;Pike&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Introduced&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0
Unported License.
&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Tue, 06 Jul 2021 00:00:00 </pubDate></item><item><title>Unifying TripleO Orchestration with Task-Core and Directord</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/yoga/taskcore-directord.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:
&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/unified-orchestration"&gt;https://blueprints.launchpad.net/tripleo/+spec/unified-orchestration&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The purpose of this spec is to introduce core concepts around Task-Core and
Directord, explain their benefits, and cover why the project should migrate
from using Ansible to using Directord and Task-Core.&lt;/p&gt;
&lt;p&gt;TripleO has long been established as an enterprise deployment solution for
OpenStack. Different task executions have been used at different times.
Originally, os-collect-config was used, then the switch to Ansible was
completed. A new task execution environment will enable moving forward
with a solution designed around the specific needs of TripleO.&lt;/p&gt;
&lt;p&gt;The tools being introduced are Task-Core and Directord.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;a class="reference external" href="https://github.com/mwhahaha/task-core"&gt;Task-Core&lt;/a&gt;:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;A dependency management and inventory graph solution which allows operators
to define tasks in simple terms with robust dominion over a given
environment. Declarative dependencies will ensure that if a container/config
is changed, only the necessary services are reloaded/restarted. Task-Core
provides access to the right tools for a given job with provenance, allowing
operators and developers to define outcomes confidently.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;a class="reference external" href="https://github.com/cloudnull/directord"&gt;Directord&lt;/a&gt;:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;A deployment framework built to manage the data center life cycle, which is
both modular and fast. Directord focuses on consistently maintaining
deployment expectations with a near real-time level of &lt;a class="reference external" href="https://directord.com/overview.html#comparative-analysis"&gt;performance&lt;/a&gt; at almost
any scale.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Task execution in TripleO is:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Slow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Resource intensive&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Complex&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Defined in a static and sequential order&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Not optimized for scale&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;TripleO presently uses Ansible to achieve its task execution orchestration
goals. While the TripleO tooling around Ansible (playbooks, roles, modules,
plugins) has worked and is likely to continue working should maintainers bear
an increased burden, future changes around direction due to &lt;a class="reference external" href="https://ansible-runner.readthedocs.io/en/latest/execution_environments.html"&gt;Ansible Execution
Environments&lt;/a&gt; provide an inflection point. These upstream changes within
Ansible, where it is fundamentally moving away from the TripleO use case, force
TripleO maintainers to take on more ownership for no additional benefit. The
TripleO use case is actively working against the future direction of Ansible.&lt;/p&gt;
&lt;p&gt;Further, the Ansible lifecycle has never matched that of TripleO. A single
consistent and backwards compatible Ansible version can not be used across a
single version of TripleO without the tripleo-core team committing to maintain
that version of Ansible, or commit to updating the Ansible version in a stable
TripleO release. The cost to maintain a tool such as Ansible that the core team
does not own is high vs switching to custom tools designed specifically for the
TripleO use case.&lt;/p&gt;
&lt;p&gt;The additional cost of maintaining Ansible as the task execution engine for
TripleO, has a high likelihood of causing a significant disruption to the
TripleO project; this is especially true as the project looks to support future
OS versions.&lt;/p&gt;
&lt;p&gt;Presently, there are diminishing benefits that can be realized from any
meaningful performance, scale, or configurability improvments. The
simplification efforts and work around custom Ansible strategies and plugins
have reached a conclusion in terms of returns.&lt;/p&gt;
&lt;p&gt;While other framework changes to expose scaling mechanisms, such as using
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--limit&lt;/span&gt;&lt;/code&gt; or partitioning of the ansible execution across multiple stacks or
roles do help with the scaling problem, they are however in the category of
work arounds as they do not directly address the inherent scaling issues with
task executions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;To make meaningful task execution orchestration improvements, TripleO must
simplify the framework with new tools, enable developers to build intelligent
tasks, and provide meaningful performance enhancements that scale to meet
operators’ expectations. If TripleO can capitalize on this moment, it will
improve the quality of life for day one deployers and day two operations and
upgrades.&lt;/p&gt;
&lt;p&gt;The proposal is to replace all usage of Ansible with Directord for task
execution, and add the usage of Task-Core for dynamic task dependencies.&lt;/p&gt;
&lt;p&gt;In some ways, the move toward Task-Core and Directord creates a
&lt;a class="reference external" href="https://xkcd.com/974"&gt;General-Problem&lt;/a&gt;, as it’s proposing the replacement of many bespoke tools, which
are well known, with two new homegrown ones. Be that as it may, much attention
has been given to the user experience, addressing many well-known pain points
commonly associated with TripleO environments, including: scale, barrier to
entry, execution times, and the complex step process.&lt;/p&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;This specification consists of two parts that work together to achieve the
project goals.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Task-Core:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Task-Core builds upon native OpenStack libraries to create a dependency graph
and executes a compiled solution. With Task-Core, TripleO will be able to
define a deployment with dependencies instead of brute-forcing one. While
powerful, Task-Core keeps development easy and consistent, reducing the time
to deliver and allowing developers to focus on their actual deliverable, not
the orchestration details. Task-Core also guarantees reproducible builds,
runtime awareness, and the ability to resume when issues are encountered.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Templates containing step-logic and ad-hoc tasks will be refactored into
Task-Core definitions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Each component can have its own Task-Core purpose, providing resources and
allowing other resources to depend on it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The invocation of Task-Core will be baked into the TripleO client, it will
not have to be invoked as a separate deployment step.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Advanced users will be able to use Task-Core to meet their environment
expectations without fully understanding the deployment nuance of multiple
bespoke systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Employs a validation system around inputs to ensure they are correct before
starting the deployment. While the validation wont ensure an operational
deployment, it will remove some issues caused by incorrect user input, such
as missing dependent services or duplicate services; providing early feedback
to deployers so they’re able to make corrections before running longer
operations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Directord:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Directord provides a modular execution platform that is aware of managed
nodes. Because Directord leverages messaging, the platform can guarantee
availability, transport, and performance. Directord has been built from the
ground up, making use of industry-standard messaging protocols which ensure
pseudo-real-time performance and limited resource utilization. The built-in
DSL provides most of what the TripleO project will require out of the box.
Because no solution is perfect, Directord utilizes a plugin system that will
allow developers to create new functionality without compromise or needing to
modify core components. Additionally, plugins are handled the same, allowing
Directord to ensure the delivery and execution performance remain consistent.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Directord is a single application that is ideally suited for containers while
also providing native hooks into systems; this allows Directord to operate in
heterogeneous environments. Because Directord is a simplified application,
operators can choose how they want to run it and are not forced into a one size
fits all solution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directord is platform-agnostic, allowing it to run across systems, versions,
and network topologies while simultaneously guaranteeing it maintains the
smallest possible footprint.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directord is built upon messaging, giving it the unique ability to span
network topologies with varying latencies; messaging protocols compensate for
high latency environments and will finally give TripleO the ability to address
multiple data-centers and fully embrace “the edge.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directord client/server communication is secured (TLS, etc) and encrypted.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directord node management to address unreachable or flapping clients.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With Task-Core and Directord, TripleO will have an intelligent dependency graph
that is both easy to understand and extend. TripleO will now be aware of things
like service dependencies, making it possible to run day two operations quickly
and more efficiently (e.g, update and restart only dependent services).
Finally, TripleO will shrink its maintenance burden by eliminating Ansible.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Stay the course with Ansible&lt;/p&gt;
&lt;p&gt;Continuing with Ansible for task execution means that the TripleO core team
embraces maintaining Ansible for the specific TripleO use case. Additionally,
the TripleO project begins documenting the scale limitations and the boundaries
that exist due to the nature of task execution. Focus needs to shift to the
required maintenance necessary for functional expectations TripleO.  Specific
Ansible versions also need to be maintained beyond their upstream lifecycle.
This maintenance would likely include maintaining an Ansible branch where
security and bug fixes could be backported, with our own project CI to validate
functionality.&lt;/p&gt;
&lt;p&gt;TripleO could also embrace the use of &lt;a class="reference external" href="https://ansible-runner.readthedocs.io/en/latest/execution_environments.html"&gt;Ansible Execution Environments&lt;/a&gt; through
continued investigative efforts. Although, if TripleO is already maintaining
Ansible, this would not be strictly required.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Task-Core and Directord are two new tools and attack surfaces, which will
require a new security assessment to be performed to ensure the tooling
exceeds the standard already set. That said, steps have already been taken to
ensure the new proposed architecture is &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards"&gt;FIPS&lt;/a&gt; compatible, and enforces
&lt;a class="reference external" href="https://directord.com/drivers.html"&gt;transport encryption&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Directord also uses &lt;a class="reference external" href="https://pypi.org/project/ssh-python"&gt;ssh-python&lt;/a&gt; for bootstrapping tasks.&lt;/p&gt;
&lt;p&gt;Ansible will be removed, and will no longer have a security impact within
TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;The undercloud can be upgraded in place to use Directord and Task-Core. There
will be upgrade tasks that will migrate the undercloud as necessary to use the
new tools.&lt;/p&gt;
&lt;p&gt;The overcloud can also be upgraded in place with the new tools. Upgrade tasks
will be migrated to use the Directord DSL just like deployment tasks. This spec
proposes no changes to the overcloud architecture itself.&lt;/p&gt;
&lt;p&gt;As part of the upgrade task migration, the tasks can be rewritten to take
advantage of the new features exposed by these tools. With the introduction of
Task-Core, upgrade tasks can use well-defined dependencies for dynamic
ordering. Just like deployment, update/upgrade times will be decreased due to
the aniticipated performance increases.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;When following the &lt;a class="reference external" href="https://xkcd.com/85"&gt;happy path&lt;/a&gt;, the end-user, deployers, and operators will
not interact with this change as the user interface will effectively remain the
same. However the user experience will change. Operators accustomed to Ansible
tasks, logging, and output, will instead need to become familiar with those
same aspects of Directord and Task-Core.&lt;/p&gt;
&lt;p&gt;If an operator wishes to leverage the advanced capabilities of either
Task-Core or Directord, the tooling will have documented end user interfaces
available for interfaces such as custom components and orchestrations.&lt;/p&gt;
&lt;p&gt;It should be noted that there’s a change in deployment architecture in that
Directord follows a server/client model; albeit an ephemeral one. This change
aims to be fully transparent, however, it is something that end users,
deployers, will need to be aware of.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;This specification will have a positive impact on performance.  Due to the
messaging architecture of Directord, near-realtime task execution will be
possible in parallel across all nodes.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://directord.com/overview.html#comparative-analysis"&gt;Performance&lt;/a&gt; analysis has been done comparing configurability and runtime of
Directord vs. Ansible, the TripleO default orchestration tool. This analysis
highlights some of the performance gains this specification will provide;
initial testing suggests that Task-Core and Directord is more than 10x
faster than our current tool chain, representing a potential 90% time savings
in just the task execution overhead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;One of the goals of this specification is to remove impediments in the time
to work. Deployers should not be spending exorbitant time waiting for tools to
do work; in some cases, waiting longer for a worker to be available than it
would take to perform a task manually.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improvements from being able to execute more efficiently in parallel.  The
Ansible strategy work allowed us to run tasks from a given Ansible play in
parallel accoss the nodes. However this was limited to a effectively a single
play per node in terms of execution.  The granularity was limited to a play
such that an Ansible play that with 100 items of work for one role and 10
items of work would be run in parallel on the nodes. The role with 10 items
of work would likely finish first and the overall execution would have to
wait until the entire play was completed everywhere. The long pole for a
play’s execution is the node with the most set of tasks.  With the transition
to task-core and directord, the overall unit of work is an orchestration
which may have 5 tasks. If we take the same 100 tasks for one role and split
them up into 20 orchestrations that can be run in parallel, and the 10 items
of work into two orchestrations for the other roles. We are able to better
execute the work in parallel when there are no specific ordering
requirements. Improvements are expected around host prep tasks and other
services where we do not have specific ordering requirements. Today these
tasks get put in a random spot within a play and have to wait on other
unrelated tasks to complete before being run.  We expect there to be less
execution overhead time per the other items in this section, however the
overall improvements are limited based on how well we can remove unnecessary
ordering requirements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deployers will no longer be required to run a massive server for medium-scale
deployment. Regardless of size, the memory footprint and compute cores needed
to execute a deployment will be significantly reduced.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Task-Core and Directord represent an unknown factor; as such, they are
&lt;strong&gt;not&lt;/strong&gt; battle-tested and will create uncertainty in an otherwise “&lt;a class="reference external" href="https://xkcd.com/1343"&gt;stable&lt;/a&gt;”
project.&lt;/p&gt;
&lt;p&gt;Deployers will experience the time savings of doing deployments.  Deployers who
implement new services will need to do so with Directord and Task-Core.&lt;/p&gt;
&lt;p&gt;Extensive testing has been done;
all known use-cases, from system-level configuration to container pod
orchestration, have been covered, and automated tests have been created to
ensure nothing breaks unexpectedly. Additionally, for the first time, these
projects have expectations on performance, with tests backing up those claims,
even at a large scale.&lt;/p&gt;
&lt;p&gt;At present, TripleO assumes SSH access between the Undercloud and
Overcloud is always present. Additionally, TripleO believes the infrastructure
is relatively static, making day two operations risky and potentially painful.
Task-Core will reduce the computational burden when crafting action plans, and
Directord will ensure actions are always performed against the functional
hosts.&lt;/p&gt;
&lt;p&gt;Another improvement this specification will enhance is in the area of vendor
integrations. Vendors will be able to provide meaningful task definitions which
leverage an intelligent inventory and dependency system. No longer will TripleO
require vendors have in-depth knowledge of every deployment detail, even those
outside of the scope of their deliverable. By easing the job definitions,
simplifying the development process, and speeding up the execution of tasks are
all positive impacts on deployers.&lt;/p&gt;
&lt;p&gt;Test clouds are still highly recommended sources of information; however,
system requirements on the Undercloud will reduce. By reducing the resources
required to operate the Undercloud, the cost of test environments, in terms of
both hardware and time, will be significantly lowered. With a lower barrier to
entry developers and operators alike will be able to more easily contribute to
the overall project.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;To fully realize the benefits of this specification Ansible tasks will need to
be refactored into the Task-Core scheme. While Task-Core can run Ansible and
Directord has a plugin system which easily allows developers to port legacy
modules into Directord plugins, there will be a developer impact as the TripleO
development methodology will change. It’s fair to say that the potential
developer impact will be huge, yet, the shift isn’t monumental. Much of the
Ansible presently in TripleO is shell-oriented, and as such, it is easily
portable and as stated, compatibility layers exist allowing the TripleO project
to make the required shift gradually. Once the Ansible tasks are
ported, the time saved in execution will be significant.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Example &lt;a class="reference external" href="https://raw.githubusercontent.com/mwhahaha/task-core/main/examples/directord/services/openstack-keystone.yaml"&gt;Task-Core and Directord implementation for Keystone&lt;/a&gt;:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;While this implementation example is fairly basic, it does result in a
functional Keystone environment and in roughly 5 minutes and includes
services like MySQL, RabbitMQ, Keystone as well as ensuring that the
operating systems is setup and configured for a cloud execution environment.
The most powerful aspect of this example is the inclusion of the graph
dependency system which will allow us easily externalize services.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The use of advanced messaging protocols instead of SSH means TripleO can more
efficiently address deployments in local data centers or at the edge&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Directord server and storage can be easily offloaded, making it possible
for the TripleO Client to be executed from simple environments without access
to the overcloud network; imagine running a massive deployment from a laptop.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;In terms of essential TripleO integration, most of the work will occur within
the &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient"&gt;tripleoclient&lt;/a&gt;, with the following new workflow.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-heat-templates/+/798747"&gt;Execution Workflow&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;┌────┐   ┌─────────────┐   ┌────┐   ┌─────────┐   ┌─────────┬──────┐   ???????????
│USER├──►│TripleOclient├──►│Heat├──►│Task-Core├──►│Directord│Server├──►? Network ?
└────┘   └─────────────┘   └────┘   └─────────┘   └─────────┴──────┘   ???????????
                ▲                                             ▲             ▲
                │                       ┌─────────┬───────┐   |             |
                └──────────────────────►│Directord│Storage│◄──┘             |
                                        └─────────┴───────┘                 |
                                                                            |
                                                  ┌─────────┬──────┐        |
                                                  │Directord│Client│◄───────┘
                                                  └─────────┴──────┘
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Directord|Server - Task executor connecting to client.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directord|Client - Client program running on remote hosts connecting back to
the Directord|Server.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directord|Storage - An optional component, when not externalized, Directord will
maintain the runtime storage internally. In this configuration Directord is
ephemeral.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To enable a gradual transition, &lt;a class="reference external" href="https://github.com/ansible/ansible-runner"&gt;ansible-runner&lt;/a&gt; has been implemented within
Task-Core, allowing the TripleO project to convert playbooks into tasks that
rely upon strongly typed dependencies without requiring a complete rewrite. The
initial implementation should be transparent. Once the Task-Core hooks are set
within &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient"&gt;tripleoclient&lt;/a&gt; functional groups can then convert their &lt;a class="reference external" href="https://github.com/openstack/tripleo-ansible"&gt;tripleo-ansible&lt;/a&gt;
roles or ad-hoc Ansible tasks into Directord orchestrations. Teams will have
the flexibility to transition code over time and are incentivized by a
significantly improved user experience and shorter time to delivery.&lt;/p&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Cloudnull - Kevin Carter&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mwhahaha - Alex Schultz&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Slagle - James Slagle&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;???&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Migrate Directord and Task-Core to the OpenStack namespace.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Package all of Task-Core, Directord, and dependencies for pypi&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RPM Package all of Task-Core, Directord, and dependencies for RDO&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directord container image build integration within TripleO / tcib&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Converge on a Directord deployment model (container, system, hybrid).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement the Task-Core code path within TripleO client.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Port in template Ansible tasks to Directord orchestrations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Port Ansible roles into Directord orchestrations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Port Ansible modules and actions into pure Python or Directord components&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Port Ansible workflows in tripleoclient into pure Python or Directord
orchestrations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migration tooling for Heat templates, Ansible roles/modules/actions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Port Ansible playbook workflows in tripleoclient to pure Python or
Directord orchestrations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Undercloud upgrade tasks to migrate to Directord + Task-Core architecture&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Overcloud upgrade tasks to migrate to enable Directord client bootstrapping&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Both Task-Core and Directord are dependencies, as they’re new projects. These
dependencies may or may not be brought into the OpenStack namespace;
regardless, both of these projects, and their associated dependencies, will
need to be packaged and provided for by RDO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;If successful, the implementation of Task-Core and Directord will leave the
existing testing infrastructure unchanged. TripleO will continue to function as
it currently does through the use of the &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient"&gt;tripleoclient&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;New tests will be created to ensure the Task-Core and Directord components
remain functional and provide an SLA around performance and configurability
expectations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation around Ansible will need to be refactored.&lt;/p&gt;
&lt;p&gt;New documentation will need to be created to describe the advanced
usage of Task-Core and Directord. Much of the client interactions from the
“&lt;a class="reference external" href="https://xkcd.com/85"&gt;happy path&lt;/a&gt;” will remain unchanged.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Directord official documentation &lt;a class="reference external" href="https://directord.com"&gt;https://directord.com&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ansible’s decision to pivot to execution environments:
&lt;a class="reference external" href="https://ansible-runner.readthedocs.io/en/latest/execution_environments.html"&gt;https://ansible-runner.readthedocs.io/en/latest/execution_environments.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Mon, 28 Jun 2021 00:00:00 </pubDate></item><item><title>Support Keystoneless Undercloud (basic auth or noauth)</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/xena/keystoneless-undercloud.html</link><description>
 
&lt;p&gt;The goal of this proposal is to introduce the community to the idea of
removing Keystone from TripleO undercloud and run the remaining OpenStack
services either with basic authentication or noauth (i.e. Standalone mode).&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;With the goal of having a thin undercloud we’ve been simplifying the
undercloud architecture since a few cycles and have removed a number
of OpenStack services. After moving to use &lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/triplo-network-data-v2-node-ports.html"&gt;network_data_v2&lt;/a&gt; and
&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/ephemeral-heat-overcloud.html"&gt;ephemeral_heat&lt;/a&gt; by default, we are left only with neutron, ironic
and ironic-inspector services.&lt;/p&gt;
&lt;p&gt;Keystone authentication and authorization does not add lot of value to the
undercloud. We use &lt;cite&gt;admin&lt;/cite&gt; and &lt;cite&gt;admin&lt;/cite&gt; project for everything. There are
also few service users (one per service) for communication between services.
Most of the overcloud deployment and configuration is done as the os user.
Also, for large deployments we increase token expiration time to a large
value which is orthogonal to keystone security.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;At present, we have keystone running in the undercloud providing catalog,
authentication/authorization services to the remaining deployed services
neutron, ironic and ironic-inspector. Ephemeral heat uses a fake keystone
client which does not talk to keystone.&lt;/p&gt;
&lt;p&gt;All these remaining services are capabale of running standalone using either
&lt;cite&gt;http_basic&lt;/cite&gt; or &lt;cite&gt;noauth&lt;/cite&gt; auth_strategy and clients using openstacksdk and
keystoneauth can use &lt;cite&gt;HTTPBasicAuth&lt;/cite&gt; or &lt;cite&gt;NoAuth&lt;/cite&gt; identity plugins with the
standalone services.&lt;/p&gt;
&lt;p&gt;The proposal is to deploy these OpenStack services either with basic auth or
noauth and remove keystone from the undercloud by default.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Deploy ironic/ironic-inspector/neutron with &lt;cite&gt;http_basic&lt;/cite&gt; (default) or &lt;cite&gt;noauth&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This would also allow us to remove some additional services like &lt;cite&gt;memcached&lt;/cite&gt;
from the undercloud mainly used for authtoken caching.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Keep keystone in the undercloud as before.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;There should not be any significant security implications by disabling keystone
on the undercloud as there are no multi-tenancy and RABC requirements for
undercloud users/operators. Deploying baremetal and networking services with &lt;cite&gt;http_basic&lt;/cite&gt; authentication would protect against any possible intrusion as before.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;There will be no upgrade impact; this change will be transparent to the
end-user.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Disabling authentication and authorization would make the API calls faster and
the overall resource requirements of undercloud would reduce.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add THT support for configuring &lt;cite&gt;auth_strategy&lt;/cite&gt; for ironic and neutron
services and manage htpasswd files used for basic authentication by the
ironic services.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;IronicAuthStrategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;http_basic&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;NeutronAuthStrategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;http_basic&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Normally, Identity service middleware provides a X-Project-Id header based on
the authentication token submitted by the service client. However when keystone
is not available neutron expects &lt;cite&gt;project_id&lt;/cite&gt; in the &lt;cite&gt;POST&lt;/cite&gt; requests (i.e create
API). Also, metalsmith communicates with &lt;cite&gt;neutron&lt;/cite&gt; to create &lt;cite&gt;ctlplane&lt;/cite&gt; ports for
instances.&lt;/p&gt;
&lt;p&gt;Add a middleware for neutron API &lt;cite&gt;http_basic&lt;/cite&gt; pipeline to inject a fake project_id
in the context.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add basic authentication middleware to oslo.middleware and use it for undercloud
neutron.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create/Update clouds.yaml to use &lt;cite&gt;auth_type: http_basic&lt;/cite&gt; and use endpoint overrides
for the public endpoints with &lt;cite&gt;&amp;lt;service_name&amp;gt;_endpoint_override&lt;/cite&gt; entries. We
would leverage the &lt;cite&gt;EndpointMap&lt;/cite&gt; and change &lt;cite&gt;extraconfig/post_deploy&lt;/cite&gt; to create
and update clouds.yaml.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;clouds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;undercloud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;piJsuvz3lKUtCInsiaQd4GZ1w&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;username&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;admin&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;auth_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;http_basic&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;baremetal_api_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;'1'&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;baremetal_endpoint_override&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;https://192.168.24.2:13385&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;baremetal_introspection_endpoint_override&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;https://192.168.24.2:13050&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;network_api_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;'2'&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;network_endpoint_override&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;https://192.168.24.2:13696&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;ramishra&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Other contributors:&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add basic authentication middleware in oslo.middleware
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/oslo.middleware/+/802234"&gt;https://review.opendev.org/c/openstack/oslo.middleware/+/802234&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support &lt;cite&gt;auth_strategy&lt;/cite&gt; with ironic and neutron services
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-heat-templates/+/798241"&gt;https://review.opendev.org/c/openstack/tripleo-heat-templates/+/798241&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Neutron middleware to add fake project_id to noauth pipleline
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/neutron/+/799162"&gt;https://review.opendev.org/c/openstack/neutron/+/799162&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure neutron paste deploy for basic authentication
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-heat-templates/+/804598"&gt;https://review.opendev.org/c/openstack/tripleo-heat-templates/+/804598&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Disable keystone by default
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-heat-templates/+/794912"&gt;https://review.opendev.org/c/openstack/tripleo-heat-templates/+/794912&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add option to enable keystone if required
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/python-tripleoclient/+/799409"&gt;https://review.opendev.org/c/openstack/python-tripleoclient/+/799409&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other patches:
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-ansible/+/796991"&gt;https://review.opendev.org/c/openstack/tripleo-ansible/+/796991&lt;/a&gt;
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-common/+/796825"&gt;https://review.opendev.org/c/openstack/tripleo-common/+/796825&lt;/a&gt;
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-ansible/+/797381"&gt;https://review.opendev.org/c/openstack/tripleo-ansible/+/797381&lt;/a&gt;
&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-heat-templates/+/799408"&gt;https://review.opendev.org/c/openstack/tripleo-heat-templates/+/799408&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Ephemeral heat and network-data-v2 are used as defaults.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Update the undercloud installation and upgrade guides.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/triplo-network-data-v2-node-ports.html"&gt;network_data_v2&lt;/a&gt; specification&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/ephemeral-heat-overcloud.html"&gt;ephemeral_heat&lt;/a&gt; specification&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Wed, 16 Jun 2021 00:00:00 </pubDate></item><item><title>Deploy whole disk images by default</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/xena/whole-disk-default.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/whole-disk-default"&gt;https://blueprints.launchpad.net/tripleo/+spec/whole-disk-default&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This blueprint tracks the tasks required to switch to whole-disk overcloud
images by default instead of the current overcloud-full partition image.&lt;/p&gt;
&lt;section id="whole-disk-images-vs-partition-images"&gt;
&lt;h2&gt;Whole disk images vs partition images&lt;/h2&gt;
&lt;p&gt;The current overcloud-full partition image consists of the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A compressed qcow2 image file which contains a single root partition with
all the image contents&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A kernel image file for the kernel to boot&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A ramdisk file to boot with the kernel&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whereas the overcloud-hardened-uefi-full whole-disk image consists of a single
compressed qcow2 image containing the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A partition layout containing UEFI boot, legacy boot, and a root partition&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The root partition contains a single lvm group with a number of logical
volumes of different sizes which are mounted at /, /tmp, /var, /var/log, etc.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When a partition image is deployed, ironic-python-agent does the following on
the baremetal disk being deployed to:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Creates the boot and root partitions on the disk&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copies the partition image contents to the root partition&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Populates the empty boot partition with everything required to boot, including
the kernel image, ramdisk file, a generated grub config, and an installed
grub binary&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When a whole-disk image is deployed, ironic-python-agent simply copies the whole
image to the disk.&lt;/p&gt;
&lt;p&gt;When the partition image deploy boots for the first time, the root partition
grows to take up all of the available disk space. This mechanism is provided
by the base cloud image. There is no equivalent partition growing mechanism
for a multi-volume LVM whole-disk image.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The capability to build and deploy a whole-disk overcloud image has been
available for many releases, but it is time to switch to this as the default.
Doing this will avoid the following issues and bring the following benefits:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;As of CentOS-8.4, grub will stop support for installing the bootloader on a
UEFI system. ironic-python-agent depends on grub installs to set up EFI boot
with partition images, so UEFI boot will stop working when CentOS 8.4 is
used.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other than this new grub behaviour, keeping partition boot working in
ironic-python-agent has been a development burden and involves code
complexity which is avoided for whole-disk deployments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TripleO users are increasingly wanting to deploy with UEFI Secure Boot
enabled, this is only possible with whole-disk images that use the signed
shim bootloader.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Partition images need to be distributed with kernel and ramdisk files, adding
complexity to file management of deployed images compared to a single
whole-disk image file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a class="reference external" href="https://teknoarticles.blogspot.com/2017/07/build-and-use-security-hardened-images.html"&gt;requirements for a hardened image&lt;/a&gt; includes having separate volumes for
root, data etc. All TripleO users get the security benefit of hardened images
when a whole-disk image is used.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We currently need dedicated CI jobs both in the upstream check/gate (when the
relevant files changed) but also in periodic integration lines, to build and
publish the latest ‘current-tripleo’ version of the hardened images. In the long
term, only a single hardend UEFI whole-disk image needs to be built and
published, reducing the CI footprint. (in the short term, CI footprint may go up
so the whole-disk image can be published, and while hardened vs hardened-uefi
jobs are refactored.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Wherever the partition image overcloud-full.qcow2 is built, published, or used
needs to be updated to use overcloud-hardened-uefi-full.qcow2 by default.&lt;/p&gt;
&lt;p&gt;This blueprint will be considered complete when it is possible to follow the
default path in the documentation and the result is an overcloud deployed
with whole-disk images.&lt;/p&gt;
&lt;section id="image-upload-tool"&gt;
&lt;h4&gt;Image upload tool&lt;/h4&gt;
&lt;p&gt;The default behaviour of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;image&lt;/span&gt; &lt;span class="pre"&gt;upload&lt;/span&gt;&lt;/code&gt; needs to be
aware that overcloud-hardened-uefi-full.qcow2 should be uploaded by default
when it is detected in the local directory.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reviewing-image-build-yaml"&gt;
&lt;h4&gt;Reviewing image build YAML&lt;/h4&gt;
&lt;p&gt;Once the periodic jobs are updated, image YAML defining
overcloud-hardened-full can be deleted, leaving only
overcloud-hardened-uefi-full. Other refactoring can be done such as renaming
-python3.yaml back to -base.yaml.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reviewing-partition-layout"&gt;
&lt;h4&gt;Reviewing partition layout&lt;/h4&gt;
&lt;p&gt;Swift data is stored in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/srv&lt;/span&gt;&lt;/code&gt; and according to the criteria of hardened
images this should be in its own partition. This will need to be added to the
existing partition layout for whole-disk UEFI images.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="partition-growing"&gt;
&lt;h4&gt;Partition growing&lt;/h4&gt;
&lt;p&gt;On node first boot, a replacement mechanism for growing the root partition is
required. This is a harder problem for the multiple LVM volumes which the
whole-disk image creates. Generally the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/var&lt;/span&gt;&lt;/code&gt; volume should grow to take
available disk space because this is where TripleO and OpenStack services store
their state, but sometimes &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/srv&lt;/span&gt;&lt;/code&gt; will need to grow for Swift storage, and
sometimes there may need to be a proportional split of multiple volumes. This
suggests that there will be new tripleo-heat-templates variables which will
specify the volume/proportion growth behaviour on a per-role basis.&lt;/p&gt;
&lt;p&gt;A new utility is required which automates this LVM volume growing
requirement. It could be implemented a number of ways:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;A new project/package containing the utility, installed on the image and
run by first-boot or early tripleo-ansible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A utility script installed by a diskimage-builder/tripleo-image-elements
element and run by first-boot or as a first-boot ansible task (post-provisioning
or early deploy).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement entirely in an ansible role, either in its own repository, or as
part of tripleo-ansible. It would be run by early tripleo-ansible.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This utility will also be useful to other cloud workloads which use LVM based
images, so some consideration is needed for making it a general purpose tool
which can be used outside an overcloud image. Because of this, option 2. is
proposed initially as the preferred way to install this utility, and it will
be proposed as a new element in diskimage-builder. Being coupled with
diskimage-builder means the utility can make assumptions about the partition
layout:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;a single Volume Group that defaults to name &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vg&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;volume partitions are formatted with XFS, which can be resized while mounted&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Because of the grub situation, the only real alternative is dropping support
for UEFI boot, which means only supporting legacy BIOS boot indefinitely.
This would likely have negative feedback from end-users.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;All deployments will use images that comply with the hardened-image
requirements, so deployments will gain these security benefits&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Whole disk images are UEFI Secure Boot enabled, so this blueprint brings us
closer to recommending that Secure Boot be switched on always. This will
validate to users that they have deployed boot/kernel binaries signed by Red
Hat.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;Nodes upgraded in-place will continue to be partition image based, and
new/replaced nodes will be deployed with whole-disk images. This doesn’t have
a specific upgrade implication, unless we document an option for replacing
every node in order to ensure all nodes are deployed with whole-disk images.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;There is little end-user impact other than:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The change of habit required to use overcloud-hardened-uefi-full.qcow2
instead of overcloud-full.qcow2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The need to set the heat variable if custom partition growing behaviour is
required&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;There is no known performance impact with this change.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;All deployer impacts have already been mentioned elsewhere.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;There are no developer impacts beyond the already mentioned deployer impacts.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Steve Baker &amp;lt;&lt;a class="reference external" href="mailto:sbaker%40redhat.com"&gt;sbaker&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;python-tripleoclient: image upload command, handle
overcloud-hardened-uefi-full.qcow2 as the default if it exists locally&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-ansible/cli-overcloud-node-provision.yaml: detect
overcloud-hardened-uefi-full.(qcow2|raw) as the default if it exists in
/var/lib/ironic/images&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RDO jobs:
* add periodic job for overcloud-hardened-uefi-full
* remove periodic job for overcloud-hardened-full
* modify image publishing jobs to publish overcloud-hardened-uefi-full.qcow2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-image-elements/overcloud-partition-uefi: add &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/srv&lt;/span&gt;&lt;/code&gt; logical volume
for swift data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-quickstart-extras: Use the whole_disk_images=True variable to switch to
downloading/uploading/deploying overcloud-hardened-uefi-full.qcow2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-ci/featureset001/002: Enable whole_disk_images=True&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;diskimage-builder: Add new element which installs utility for growing LVM
volumes based on specific volume/proportion mappings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-common/image-yaml:
* refactor to remove non-uefi hardened image
* rename -python3.yaml back to -base.yaml
* add the element which installs the grow partition utility&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates: Define variables for driving partition growth
volume/proportion mappings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-ansible: Consume the volume/proportion mapping and run the volume
growing utility on every node in early boot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-docs:
* Update the documentation for deploying whole-disk images by default
* Document variables for controlling partition growth&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Unless diskimage-builder require separate tracking to add the partition
growth utility, all tasks can be tracked under this blueprint.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;section id="image-building-and-publishing"&gt;
&lt;h3&gt;Image building and publishing&lt;/h3&gt;
&lt;p&gt;Periodic jobs which build images, and jobs which build and publish images to
downloadable locations need to be updated to build and publish
overcloud-hardened-uefi-full.qcow2. Initially this can be in parallel with
the existing overcloud-full.qcow2 publishing, but eventually that can be
switched off.&lt;/p&gt;
&lt;p&gt;overcloud-hardened-full.qcow2 is the same as
overcloud-hardened-uefi-full.qcow2 except that it only supports legacy BIOS
booting. Since overcloud-hardened-uefi-full.qcow2 supports both legacy BIOS
and UEFI boot, the periodic jobs which build overcloud-hardened-full.qcow2
can be switched off from Wallaby onwards (assuming these changes are backported
as far back as Wallaby).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="ci-support"&gt;
&lt;h3&gt;CI support&lt;/h3&gt;
&lt;p&gt;CI jobs which consume published images need to be modified so they can
download overcloud-hardened-uefi-full.qcow2 and deploy it as a whole-disk
image.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The TripleO Deployment Guide needs to be modified so that
overcloud-hardened-uefi-full.qcow2 is referred to throughout, and so that it
correctly documents deploying a whole-disk image based overcloud.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Fri, 30 Apr 2021 00:00:00 </pubDate></item><item><title>Cleaning container healthchecks</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/xena/healthcheck-cleanup.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/clean-container-healthchecks"&gt;https://blueprints.launchpad.net/tripleo/+spec/clean-container-healthchecks&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We don’t rely on the &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-common/src/branch/master/healthcheck"&gt;container healthcheck&lt;/a&gt; results for anything in the
infrastructure. They are time and resource consuming, and their maintenance is
mostly random. We can at least remove the ones that aren’t hitting an actual
API healthcheck endpoint.&lt;/p&gt;
&lt;p&gt;This proposal was discussed during a &lt;a class="reference external" href="https://etherpad.opendev.org/p/tripleo-xena-drop-healthchecks"&gt;session at the Xena PTG&lt;/a&gt;&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Since we moved the services to container, first with the docker engine, then
with podman, container healthchecks have been implemented and used.&lt;/p&gt;
&lt;p&gt;While the very idea of healthchecks isn’t bad, the way we (TripleO) are
making and using them is mostly wrong:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;no action is taken upon healthcheck failure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;some (most) aren’t actually checking if the service is working, but merely
that the service container is running&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The healthchecks such as &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-common/src/commit/a072a7f07ea75933a2372b1a95ae960095a3250e/healthcheck/common.sh#L49"&gt;healthcheck_port&lt;/a&gt;, &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-common/src/commit/a072a7f07ea75933a2372b1a95ae960095a3250e/healthcheck/common.sh#L85"&gt;healthcheck_listen&lt;/a&gt;,
&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-common/src/commit/a072a7f07ea75933a2372b1a95ae960095a3250e/healthcheck/common.sh#L95"&gt;healthcheck_socket&lt;/a&gt; as well as most of the scripts calling
&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-common/src/commit/a072a7f07ea75933a2372b1a95ae960095a3250e/healthcheck/common.sh#L28"&gt;healthcheck_curl&lt;/a&gt; are mostly NOT doing anything more than ensuring a
service is running - and we already have this info when the container is
“running” (good), “restarting” (not good) or “exited” (with a non-0 code
- bad).&lt;/p&gt;
&lt;p&gt;Also, the way podman implements healthchecks is relying on systemd and its
transient service and &lt;a class="reference external" href="https://www.freedesktop.org/software/systemd/man/systemd.timer.html"&gt;timers&lt;/a&gt;. Basically, for each container, a new systemd
unit is created and injected, as well as a new timer - meaning systemd calls
podman. This isn’t really good for the hosts, especially the ones having
heavy load due to their usage.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;A deep cleaning of the current healthcheck is needed, such as the
&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-common/src/commit/a072a7f07ea75933a2372b1a95ae960095a3250e/healthcheck/common.sh#L95"&gt;healthcheck_socket&lt;/a&gt;, &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-common/src/commit/a072a7f07ea75933a2372b1a95ae960095a3250e/healthcheck/common.sh#L49"&gt;healthcheck_port&lt;/a&gt;, and &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-common/src/commit/a072a7f07ea75933a2372b1a95ae960095a3250e/healthcheck/common.sh#L28"&gt;healthcheck_curl&lt;/a&gt;
that aren’t calling an actual API healthcheck endpoint. This list isn’t
exhaustive.&lt;/p&gt;
&lt;p&gt;This will drastically reduce the amount of “podman” calls, leading
to less resource issues, and provide a better comprehension when we list
the processes or services.&lt;/p&gt;
&lt;p&gt;In case an Operator wants to get some status information, they can leverage
an existing validation:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;tripleo&lt;/span&gt; &lt;span class="n"&gt;validator&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;validation&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This validation can be launched from the Undercloud directly, and will gather
remote status for every OC nodes, then provide a clear summary.&lt;/p&gt;
&lt;p&gt;Such a validation could also be launched from a third-party monitoring
instance, provided it has the needed info (mostly the inventory).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;There are multiple alternatives we can even implement as a step-by-step
solution, though any of them would more than probably require their own
specifications and discussions:&lt;/p&gt;
&lt;section id="replace-the-listed-healthchecks-by-actual-service-healthchecks"&gt;
&lt;h4&gt;Replace the listed healthchecks by actual service healthchecks&lt;/h4&gt;
&lt;p&gt;Doing so would allow to get a better understanding of the stack health, but
will not solve the issue with podman calls (hence resource eating and related
things).
Such healchecks can be launched from an external tool, for instance based
on a host’s cron, or an actual service.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="call-the-healthchecks-from-an-external-tool"&gt;
&lt;h4&gt;Call the healthchecks from an external tool&lt;/h4&gt;
&lt;p&gt;Doing so would prevent the potential resource issues with the “podman exec”
calls we’re currently seeing, while allowing a centralization for the results,
providing a better way to get metrics and stats.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="keep-things-as-is"&gt;
&lt;h4&gt;Keep things as-is&lt;/h4&gt;
&lt;p&gt;Because we have to list this one, but there are hints this isn’t the right
thing to do (hence the current spec).&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;No real Security impact. Less services/calls might lead to smaller attack
surface, and it might prevent some &lt;em&gt;denial of service&lt;/em&gt; situations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;No Upgrade impact.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The End User doesn’t have access to the healthcheck anyway - that’s more for
the operator.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The systems will be less stressed, and this can improve the current situation
regarding performances and stability.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;There is no “deployer impact” if we don’t consider they are the operator.&lt;/p&gt;
&lt;p&gt;For the latter, there’s a direct impact: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;podman&lt;/span&gt; &lt;span class="pre"&gt;ps&lt;/span&gt;&lt;/code&gt; won’t be able to show
the health status anymore or, at least, not for the containers without such
checks.&lt;/p&gt;
&lt;p&gt;But the operator is able to leverage the service-status validation instead -
this validation will even provide more information since it takes into account
the failed containers, a thing &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;podman&lt;/span&gt; &lt;span class="pre"&gt;ps&lt;/span&gt;&lt;/code&gt; doesn’t show without the proper
option, and even with it, it’s not that easy to filter.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;In order to improve the healthchecks, especially for the API endpoints, service
developers will need to implement specific tests in the app.&lt;/p&gt;
&lt;p&gt;Once it’s existing, working and reliable, they can push it to any healthcheck
tooling at disposition - being the embedded container healthcheck, or some
dedicated service as described in the third step.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;cjeanner&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Triage existing healthcheck, and if they aren’t calling actual endpoint,
deactive them in tripleo-heat-templates&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure the stack stability isn’t degraded by this change, and properly
document the “service-status” validation with the Validation Framework Team&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The second work item is more an empirical data on the long term - we currently
don’t have actual data, appart a &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1923607"&gt;Launchpad issue&lt;/a&gt; pointing to a problem
maybe caused by the way healthchecks are launched.&lt;/p&gt;
&lt;section id="possible-future-work-items"&gt;
&lt;h4&gt;Possible future work items&lt;/h4&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Initiate a discussion with CloudOps (metrics team) regarding an dedicated
healthcheck service, and how to integrate it properly within TripleO&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Initiate a cross-Team work toward actual healthcheck endpoints for the
services in need&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Those are just here for the sake of evolution. Proper specs will be needed
in order to frame the work.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;For step 1 and 2, no real dependencies are needed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Testing will require different things:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Proper metrics in order to ensure there’s no negative impact - and that any
impact is measurable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Proper insurance the removal of the healthcheck doesn’t affect the services
in a negative way&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Proper testing of the validations, especially “service-status” in order to
ensure it’s reliable enough to be considered as a replacement at some point&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;A documentation update will be needed regarding the overall healthcheck topic.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://developers.redhat.com/blog/2019/04/18/monitoring-container-vitality-and-availability-with-podman/"&gt;Podman Healthcheck implementation and usage&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Thu, 22 Apr 2021 00:00:00 </pubDate></item><item><title>TripleO First Principles</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/policy/first-principles.html</link><description>
 
&lt;p&gt;The TripleO first principles are a set of principles that guide decision making
around future direction with TripleO. The principles are used to evaluate
choices around changes in direction and architecture. Every impactful decision
does not necessarily have to follow all the principles, but we use them to make
informed decisions about trade offs when necessary.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;When evaluating technical direction within TripleO, a better and more
consistent method is needed to weigh pros and cons of choices. Defining the
principles is a step towards addressing that need.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="policy"&gt;
&lt;h2&gt;Policy&lt;/h2&gt;
&lt;section id="definitions"&gt;
&lt;h3&gt;Definitions&lt;/h3&gt;
&lt;dl&gt;
&lt;dt&gt;Framework&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;The functional implementation which exposes a set of standard enforcing
interfaces that can be consumed by a service to describe that service’s
deployment and management. The framework includes all functional pieces that
implement such interfaces, such as CLI’s, API’s, or libraries.&lt;/p&gt;
&lt;p&gt;Example: tripleoclient/tripleo-common/tripleo-ansible/tripleo-heat-templates&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Service&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;The unit of deployment. A service will implement the necessary framework
interfaces in order to describe it’s deployment.&lt;/p&gt;
&lt;p&gt;The framework does not enforce a particular service boundary, other than by
prescribing best practices. For example, a given service implementation could
deploy both a REST API and a database, when in reality the API and database
should more likely be deployed as their own services and expressed as
dependencies.&lt;/p&gt;
&lt;p&gt;Example: Keystone, MariaDB, RabbitMQ&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Third party integrations&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Service implementations that are developed and maintained outside of the
TripleO project. These are often implemented by vendors aiming to add support
for their products within TripleO.&lt;/p&gt;
&lt;p&gt;Example: Cinder drivers, Neutron plugins&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="first-principles"&gt;
&lt;h3&gt;First Principles&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;[UndercloudMigrate] No Undercloud Left Behind&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;TripleO itself as the deployment tool can be upgraded. We do
not immediately propose what the upgrade will look like or the technology
stack, but we will offer an upgrade path or a migration path.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[OvercloudMigrate] No Overcloud Left Behind&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;An overcloud deployed with TripleO can be upgraded to the next major version
with either an in place upgrade or migration.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[DefinedInterfaces] TripleO will have a defined interface specification.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;We will document clear boundaries between internal and external
(third party integrations) interfaces.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We will document the supported interfaces of the framework in the same
way that a code library or API would be documented.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Individual services of the framework can be deployed and tested in
isolation from other services. Service dependencies are expressed per
service, but do not preclude using the framework to deploy a service
isolated from its dependencies. Whether that is successful or not
depends on how the service responds to missing dependencies, and that is
a behavior of the service and not the framework.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The interface will offer update and upgrade tasks as first class citizens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The interface will offer validation tasks as first class citizens&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[OSProvisioningSeparation] Separation between operating system provisioning
and software configuration.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Baremetal configuration, network configuration and base operating system
provisioning is decoupled from the software deployment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The software deployment will have a defined set of minimal requirements
which are expected to be in-place before it begins the software deployment.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Specific linux distributions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Specific linux distribution versions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Password-less access via ssh&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Password-less sudo access&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pre-configured network bridges&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[PlatformAgnostic] Platform agnostic deployment tooling.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;TripleO is sufficiently isolated from the platform in a way that allows
for use in a variety of environments (baremetal/virtual/containerized/OS
version).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The developer experience is such that it can easily be run in
isolation on developer workstations&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[DeploymentToolingScope] The deployment tool has a defined scope&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Data collection tool.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Responsible for collecting host and state information and posting to a
centralized repository.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Handles writes to central repository (e.g. read information from
repository, do aggregation, post to central repository)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A configuration tool to configure software and services as part of the
deployment&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Manages Software Configuration&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directories&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Service (containerized or non-containerized) state&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Software packages&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Executes commands related to “configuration” of a service
Example: Configure OpenStack AZ’s, Neutron Networks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Isolated executions that are invoked independently by the orchestration tool&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Single execution state management&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Input is configuration data/tasks/etc&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A single execution produces the desired state or reports failure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Idempotent&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read-only communication with centralized data repository for configuration data&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The deployment process depends on an orchestration tool to handle various
task executions.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Task graph manager&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Task transport and execution tracker&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Aware of hosts and work to be executed on the hosts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ephemeral deployment tooling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Efficient execution&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scale and reliability/durability are first class citizens&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[CI/CDTooling] TripleO functionality should be considered within the context
of being directly invoked as part of a CI/CD pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[DebuggableFramework] Diagnosis of deployment/configuration failures within
the framework should be quick and simple. Interfaces should be provided to
enable debuggability of service failures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[BaseOSBootstrap] TripleO can start from a base OS and go to full cloud&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;It should be able to start at any point after base OS, but should be able
to handle the initial OS bootstrap&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[PerServiceManagement] TripleO can manage individual services in isolation,
and express and rely on dependencies and ordering between services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[Predictable/Reproducible/Idempotent] The deployment is predictable&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The operator can determine what changes will occur before actually applying
those changes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The deployment is reproducible in that the operator can re-run the
deployment with the same set of inputs and achieve the same results across
different environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The deployment is idempotent in that the operator can re-run the
deployment with the same set of inputs and the deployment will not change other
than when it was first deployed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the case where a service needs to restart a process, the framework
will have an interface that the service can use to notify of the
needed restart. In this way, the restarts are predictable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The interface for service restarts will allow for a service to describe
how it should be restarted in terms of dependencies on other services,
simultaneous restarts, or sequential restarts.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="non-principles"&gt;
&lt;h3&gt;Non-principles&lt;/h3&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;[ContainerImageManagement] The framework does not manage container images.
Other than using a given container image to start a container, the framework
does not encompass common container image management to include:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Building container images&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Patching container images&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Serving or mirroring container images&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caching container images&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Specific tools for container image and runtime management and that need to
leverage the framework during deployment are expected to be implemented as
services.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[SupportingTooling] Tools and software executed by the framework to deploy
services or tools required prior to service deployment by the framework are
not considered part of the framework itself.&lt;/p&gt;
&lt;p&gt;Examples: podman, TCIB, image-serve, nova-less/metalsmith&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives-history"&gt;
&lt;h2&gt;Alternatives &amp;amp; History&lt;/h2&gt;
&lt;p&gt;Many, if not all, the principles are already well agreed upon and understood as
core to TripleO. Writing them down as policy makes them more discoverable and
official.&lt;/p&gt;
&lt;p&gt;Historically, there have been instances when decisions have been guided by
desired technical implementation or outcomes. Recording the principles does not
necessarily mean those decisions would stop, but it does allow for a more
reasonable way to think about the trade offs.&lt;/p&gt;
&lt;p&gt;We do not need to adopt any principles, or record them. However, there is no
harm in doing so.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="author-s"&gt;
&lt;h3&gt;Author(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary author:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;James Slagle &amp;lt;&lt;a class="reference external" href="mailto:jslagle%40redhat.com"&gt;jslagle&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&amp;lt;launchpad-id or None&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="milestones"&gt;
&lt;h3&gt;Milestones&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="revision-history"&gt;
&lt;h2&gt;Revision History&lt;/h2&gt;
&lt;table class="docutils align-default" id="id1"&gt;
&lt;caption&gt;&lt;span class="caption-text"&gt;Revisions&lt;/span&gt;&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Release Name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;v0.0.1&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Introduced&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0
Unported License.
&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Mon, 19 Apr 2021 00:00:00 </pubDate></item><item><title>TripleO Split Control Plane from Compute/Storage Support</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/split-controlplane.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/split-controlplane"&gt;https://blueprints.launchpad.net/tripleo/+spec/split-controlplane&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec introduces support for a mode of deployment where the controlplane
nodes are deployed and then batches of compute/storage nodes can be added
independently.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently tripleo deploys all services, for all roles (groups of nodes) in
a single heat stack.  This works quite well for small to medium size deployments
but for very large environments, there is considerable benefit to dividing the
batches of nodes, e.g when deploying many hundreds/thousands of compute nodes.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Scalability can be improved when deploying a fairly static controlplane then
adding batches of e.g compute nodes when demand requires scale out.  The overhead
of updating all the nodes in every role for any scale out operation is non-trivial
and although this is somewhat mitigated by the split from heat deployed servers
to config download &amp;amp; ansible for configuration, making modular deployments easier
is of benefit when needing to scale deployments to very large environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Risk reduction - there are often requests to avoid any update to controlplane
nodes when adding capacity for e.g compute or storage, and modular deployments
makes this easier as no modification is required to the controalplane nodes to
e.g add compute nodes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This spec is not intended to cover all the possible ways achieving modular deployments,
but instead outline the requirements and give an overview of the interfaces we need to
consider to enable this flexibility.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;To enable incremental changes, I’m assuming we could still deploy the controlplane
nodes via the existing architecture, e.g Heat deploys the nodes/networks and we
then use config download to configure those nodes via ansible.&lt;/p&gt;
&lt;p&gt;To deploy compute nodes, we have several options:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Deploy multiple “compute only” heat stacks, which would generate
ansible playbooks via config download, and consume some output data
from the controlplane stack.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy additional nodes via mistral, then configure them via
ansible (today this still requires heat to generate the
playbooks/inventory even if it’s a transient stack).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy nodes via ansible, then configure them via ansible (again,
with the config download mechanism we have available today we’d
need heat to generate the configuration data).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The above doesn’t consider a “pure ansible” solution as we would have to first make ansible
role equivalents for all the composable service templates available, and that effort
is out of scope for this spec.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="scope-and-phases"&gt;
&lt;h3&gt;Scope and Phases&lt;/h3&gt;
&lt;p&gt;The three items listed in the overview cover an incremental approach
and the first phase is to implement the first item. Though this item
adds an additional dependency on Heat, this is done only to allow the
desired functionality using what is available today. In future phases
any additional dependency on Heat will need to be addressed and any
changes done during the first phase should be minimal and focus on
parameter exposure between Heat stacks. Implementation of the other
items in the overview could span multiple OpenStack development cycles
and additional details may need to be addressed in future
specifications.&lt;/p&gt;
&lt;p&gt;If a deployer is able to do the following simple scenario, then this
specification is implemented as phase 1 of the larger feature:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Deploy a single undercloud with one control-plane network&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a Heat stack called overcloud-controllers with 0 compute nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a Heat stack called overcloud-computes which may be used by the controllers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the APIs of the controllers to boot an instance on the computes deployed from the overcloud-computes Heat stack&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the above scenario the majority of the work involves exposing the
correct parameters between Heat stacks so that a controller node is
able to use a compute node as if it were an external service. This is
analogous to how TripleO provides a template where properties of an
external Ceph cluster may be used by TripleO to configure a service
like Cinder which uses the external Ceph cluster.&lt;/p&gt;
&lt;p&gt;The simple scenario above is possible without network isolation. In
the more complex workload site vs control site scenario, described
in the following section, network traffic will not be routed through
the controller. How the networking aspect of that deployment scenario
is managed will need to be addressed in a separate specification and
the overall effort will likely to span multiple OpenStack development
cycles.&lt;/p&gt;
&lt;p&gt;For the phase of implementation covered in this specification, the
compute nodes will be PXE booted by Ironic from the same provisioning
network as the controller nodes during deployment. Instances booted on
these compute nodes could connect to a provider network to which their
compute nodes have direct access. Alternatively these compute nodes
could be deployed with physical access to the network which hosts
the overlay networks. The resulting overcloud should look the same as
one in which the compute nodes were deployed as part of the overcloud
Heat stack. Thus, the controller and compute nodes will run the same
services they normally would regardless of if the deployment were
split between two undercloud Heat stacks. The services on the
controller and compute nodes could be composed to multiple servers
but determining the limits of composition is out of scope for the
first phase.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="example-usecase-scenario-workload-vs-control-sites"&gt;
&lt;h3&gt;Example Usecase Scenario: Workload vs Control Sites&lt;/h3&gt;
&lt;p&gt;One application of this feature includes the ability to deploy
separate workload and control sites. A control site provides
management and OpenStack API services, e.g. the Nova API and
Scheduler. A workload site provides resources needed only by the
workload, e.g. Nova compute resources with local storage in
availability zones which directly serve workload network traffic
without routing back to the control site. Though there would be
additional latency between the control site and workload site with
respect to managing instances, there would be no reason that the
workload itself could not perform adequately once running and each
workload site would have a smaller footprint.&lt;/p&gt;
&lt;a class="reference internal image-reference" href="../../_images/ceph-details.png"&gt;&lt;img alt="Diagram of an example control site with multiple workload sites" class="align-center" src="../../_images/ceph-details.png" style="width: 629px; height: 445px;"/&gt;&lt;/a&gt;
&lt;p&gt;This scenario is included in this specification as an example
application of the feature. This specification does not aim to address
all of the details of operating separate control and workload sites
but only to describe how the proposed feature, &lt;em&gt;deployment of
independent controlplane and compute nodes&lt;/em&gt;, for TripleO could be
built upon to simplify deployment of such sites in future versions of
TripleO. For example the blueprint to make it possible to deploy
multiple Ceph clusters in the overcloud &lt;a class="footnote-reference brackets" href="#id3" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; could be applied to
provide a separate Ceph cluster per workload site, but its scope only
focuses on changes to roles in order to enable only that feature; it
is orthogonal to this proposal.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Alternatives to the incremental change outlined in the overview include reimplementing service
configuration in ansible, such that nodes can be configured via playbooks without dependency
on the existing heat+ansible architecture.  Work is ongoing in this area e.g the ansible roles
to deploy services on k8s, but this spec is primarily concerned with finding an interim
solution that enables our current architecture to scale to very large deployments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Potentially sensitive data such as passwords will need to be shared between the controlplane
stack and the compute-only deployments.  Given the admin-only nature of the undercloud I think
this is OK.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Users will have more flexibility and control with regard to how they
choose to scale their deployments. An example of this includes
separate control and workload sites as mentioned in the example use
case scenario.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Potentially better performance at scale, although the total time could be increased assuming
each scale out is serialized.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;It is already possible to deploy multiple overcloud Heat stacks from
one undercloud, but if there are parts of the TripleO tool-chain which
assume a single Heat stack, they made need to be updated.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;shardy&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other assignees:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;gfidente
fultonj&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Proof of concept showing how to deploy independent controlplane and compute nodes using already landed patches &lt;a class="footnote-reference brackets" href="#id4" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and by overriding the EndpointMap&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If there are problems with overriding the EndpointMap, rework all-nodes-config to output the “all nodes” hieradata and vip details, such that they could span stacks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Determine what data are missing in each stack and propose patches to expose the missing data to each stack that needs it&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the proof of concept to support adding a separate and minimal ceph cluster (mon, mgr, osd) through a heat stack separate from the controller node’s heat stack.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Refine how the data is shared between each stack to improve the user experience&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the documentation to include an example of the new deployment method&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retrospect and write a follow up specification covering details necessary for the next phase&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Ideally scale testing will be performed to validate the scalability
aspects of this work. For the first phase, any changes done to enable
the simple scenario described under Scope and Phases will be tested
manually and the existing CI will ensure they do not break current
functionality. Changes implemented in the follow up phases could have
CI scenarios added.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The deployment documation will need to be updated to cover the configuration of
split controlplane environments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id3" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/deploy-multiple-ceph-clusters"&gt;Make it possible to deploy multiple Ceph clusters in the overcloud&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id4" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/q/topic:compute_only_stack2"&gt;Topic: topic:compute_only_stack2&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Tue, 23 Feb 2021 00:00:00 </pubDate></item><item><title>TripleO Repos Single Source</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/xena/tripleo-repos-single-source.html</link><description>
 
&lt;p&gt;This proposal lays out the plan to use tripleo-repos as a single source
to install and configure non-base OS repos for TripleO - including
setting the required DLRN hashes.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-repos-single-source"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-repos-single-source&lt;/a&gt;&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Reviewing the code base, there are multiple places where repos are
specified. For example,in the release files we are setting the
configuration that is applied by &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-quickstart/src/commit/d14d81204036a02562c3f4efd7acb3b38cb6ae95/roles/repo-setup/templates/repo_setup.sh.j2#L72"&gt;repo setup role&lt;/a&gt;.
Some of the other repo/version configurations are included in:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-repos"&gt;tripleo repos&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-quickstart/src/commit/d14d81204036a02562c3f4efd7acb3b38cb6ae95/roles/repo-setup/templates/repo_setup.sh.j2#L72"&gt;repo setup role&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-quickstart/src/commit/d14d81204036a02562c3f4efd7acb3b38cb6ae95/config/release/tripleo-ci/CentOS-8/master.yml#L93"&gt;release config files&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-common/src/commit/d3286377132ee6b0689a39e52858c07954711d13/container-images/tcib/base/base.yaml#L59"&gt;container tooling (base tcib file)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-ansible/src/commit/509e630baa92673e72e641635d5742da01b4dc3b/tripleo_ansible/roles/tripleo_podman/vars/redhat-8.2.yml"&gt;tripleo-ansible&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.rdoproject.org/r/31439"&gt;rdo config&lt;/a&gt; (example)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-heat-templates/src/commit/125f45820255efe370af1024080bafc695892faa/environments/lifecycle/undercloud-upgrade-prepare.yaml"&gt;tripleo-heat-templates&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-operator-ansible/src/commit/14a601a47be217386df83512fae3a2e5aa5444a3/roles/tripleo_container_image_build/molecule/default/converge.yml#L172"&gt;tripleo-operator-ansible&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The process of setting repo versions requires getting and
transforming DLRN hashes, for example resolving ‘current-tripleo’
to a particular DLRN build ID and specifying the correct proxies.
Currently a large portion of this work is done in the release files
resulting in sections of complicated and fragile Bash scripts -
duplicated across numerous release files.&lt;/p&gt;
&lt;p&gt;This duplication, coupled with the various locations in use
for setting repo configurations, modules and supported versions
is confusing and error prone.&lt;/p&gt;
&lt;p&gt;There should be one source of truth for which repos are installed
within a tripleo deployment and how they are installed.
Single-sourcing all these functions will avoid the current
problems of duplication, over-writing settings and version confusion.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;This proposal puts forward using tripleo-repos as the ‘source of truth’
for setting repo configurations, modules and supported versions -
including setting the DLRN hashes required to specify exact repo
versions to install for upstream development/CI workflows.&lt;/p&gt;
&lt;p&gt;Having a single source of truth for repo config, modules, etc. will make
development and testing more consistent, reliable and easier to debug.&lt;/p&gt;
&lt;p&gt;The intent is to use the existing tripleo-repos repo for this work and
not to create a new repo. It is as yet to be determined if we will add
a v2/versioned api or how we will handle the integration with the
existing functionality there.&lt;/p&gt;
&lt;p&gt;We aim to modularize the design and implementation of the proposed tripleo-repos
work. Two sub systems in particular have been identified that can be
implemented independently of, and ultimately to be consumed by, tripleo-repos;
the resolution of delorean build hashes from known tags (i.e. resolving
‘current-tripleo’ to a particular DLRN build ID) and the configuration of dnf
repos and modules will be implemented as independent python modules, with
their own unit tests, clis, ansible modules etc.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="integration-points"&gt;
&lt;h3&gt;Integration Points&lt;/h3&gt;
&lt;p&gt;The new work in tripleo-repos will have to support with all
the cases currently in use and will have to integrate with:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;DLRN Repos&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;release files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;container and overcloud image builds&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;rdo config&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;yum/dnf repos and modules&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ansible (Ansible module)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;promotion pipeline - ensuring the correct DLRN hashes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Incorporating the DLRN hash functionality makes the tool
more complex. Unit tests will be required to guard
against frequent breakages. This is one of the reasons that we decided to split
this DLRN hash resolution into its own dedicated python module
‘tripleo-get-hash’ for which we can have independent unit tests.&lt;/p&gt;
&lt;p&gt;The scope of the new tripleo-repos tool will be limited to upstream
development/CI workflows.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Functionality to set repos, modules and versions is already available today.
It would be possible to leave the status quo or:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Use rdo config to set one version per release - however, this would not
address the issue of changing DLRN hashes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create an rpm that lays down /etc/tripleo-release where container-tools could
be meta data in with that, similar to /etc/os-release&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;No security impact is anticipated. The work is currently in the tripleo
open-source repos and will remain there - just in a consolidated
place and format.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;Currently there will be no upgrade impact. The new CLI will support
all release versions under support and in use. At a later date,
when the old CLI is deprecated there may be some update
implications.&lt;/p&gt;
&lt;p&gt;However,there may be work to make the emit_releases_file
&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-ci/src/branch/master/scripts/emit_releases_file/emit_releases_file.py"&gt;https://opendev.org/openstack/tripleo-ci/src/branch/master/scripts/emit_releases_file/emit_releases_file.py&lt;/a&gt;
functionality compatible with the new CLI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Work done on the new project branch will offer a different version of CLI, v2.
End users would be able to select which version of the CLI to use - until
the old CLI is deprecated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;No performance impact is expected. Possible performance improvements could
result from ensuring that proxy handling (release file, mirrors, rdoproject)
is done correctly and consistently.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;See &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;`Other&lt;/span&gt; &lt;span class="pre"&gt;End&lt;/span&gt; &lt;span class="pre"&gt;User&lt;/span&gt; &lt;span class="pre"&gt;Impact`&lt;/span&gt;&lt;/code&gt; section.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;The functionality added to tripleo-repos will be writen as a Python module
with a CLI and will be able to perform the following primary functions:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Single source the installation of all TripleO related repos&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Include the functionality current available in the repo-setup role
including creating repos from templates and files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Perform proxy handling such as is done in the release files
(mirrors, using rdoproject for DLRN repos)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get and transform human-readable DLRN hashes - to be implemented as an
independent module.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support setting yum modules such as container-tools - to be implemented
as an independent module.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support enabling and disabling repos and setting their priorities&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The repo-setup role shall remain but it will invoke tripleo-repos.
All options required to be passed to tripleo-repos should be in the
release file.&lt;/p&gt;
&lt;p&gt;Work done on the new project branch will offer a different version of CLI, v2.
Unit tests will be added on this branch to test the new CLI directly.
CI would be flipped to run in the new branch when approved by TripleO teams.
All current unit tests should pass with the new code.&lt;/p&gt;
&lt;p&gt;An Ansible module will be added to call the tripleo-repos
options from Ansible directly without requiring the end
user to invoke the Python CLI from within Ansible.&lt;/p&gt;
&lt;p&gt;The aim is for tripleo-repos to be the single source for all repo related
configuration. In particular the goal is to serve the following 3 personas:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Upstream/OpenStack CI jobs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Downstream/OSP/RHEL jobs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Customer installations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The configuration required to serve each of these use cases is slightly
different. In upstream CI jobs we need to configure the latest current-tripleo
promoted content repos. In downstream/OSP jobs we need to use rhos-release
and in customer installations we need to use subscription manager.&lt;/p&gt;
&lt;p&gt;Because of these differing requirements we are leaning towards storing the
configuration for each in their intended locations, with the upstream config
being the ‘base’ and the downstream config building ontop of that (the
implication is that some form of inheritance will be used to avoid duplication).
This was discussed during the &lt;a class="reference external" href="https://etherpad.opendev.org/p/ci-tripleo-repos"&gt;Xena PTG session&lt;/a&gt;&lt;/p&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;sshnaidm (DF and CI)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;marios (CI and W-release PTL)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;weshay&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;chandankumar&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ysandeep&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;arxcruz&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;rlandy&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;other DF members (cloudnull)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="proposed-schedule"&gt;
&lt;h2&gt;Proposed Schedule&lt;/h2&gt;
&lt;p&gt;Investigative work will be begin in the W-release cycle on a project branch
in tripleo-repos. The spec will be put forward for approval in the X-release
cycle and impactful and integration work will be visible once the spec
is approved.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;This work has a dependency on the &lt;a class="reference external" href="https://dlrn.readthedocs.io/en/latest/api.html"&gt;DLRN API&lt;/a&gt; and on yum/dnf.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Specific unit tests will be added with the python-based code built.
All current CI tests will run through this work and will
test it on all releases and in various aspects such as:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;container build&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;overcloud image build&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TripleO deployments (standalone, multinode, scenarios, OVB)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;updates and upgrades&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="cli-design"&gt;
&lt;h2&gt;CLI Design&lt;/h2&gt;
&lt;p&gt;Here is an abstract sketch of the intended cli design for the
new tripleo-repos.&lt;/p&gt;
&lt;p&gt;It covers most of the needs discussed at multiple places.&lt;/p&gt;
&lt;section id="scenario-1"&gt;
&lt;h3&gt;Scenario 1&lt;/h3&gt;
&lt;p&gt;The goal is to construct a repo with the correct hash for an integration
or a component pipeline.&lt;/p&gt;
&lt;p&gt;For this scenario:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Any combination of &lt;cite&gt;hash, distro, commit, release, promotion, url&lt;/cite&gt; parameters can passed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-repos/src/branch/master/tripleo-get-hash"&gt;tripleo-get-hash&lt;/a&gt; module to determine the DLRN build ID&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the calculated DLRN build ID to create and add a repo&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="scenario-2"&gt;
&lt;h3&gt;Scenario 2&lt;/h3&gt;
&lt;p&gt;The goal is to construct any type of yum/dnf repo.&lt;/p&gt;
&lt;p&gt;For this scenario:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Construct and add a yum/dnf repo using a combination of the following parameters&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;filename - filename for saving the resulting repo (mandatory)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;reponame - name of repository (mandatory)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;baseurl - base URL of the repository (mandatory)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;down_url - URL to download repo file from (mandatory/multually exclusive to baseurl)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;priority - priority of resulting repo (optional)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;enabled - 0/1 whether the repo is enabled or not (default: 1 - enabled)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gpgcheck - whether to check GPG keys for repo (default: 0 - don’t check)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;module_hotfixes - whether to make all RPMs from the repository available (default: 0)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;sslverify - whether to use a cert to use repo metadata (default: 1)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;type - type of the repo(default: generic, others: custom and file)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="scenario-3"&gt;
&lt;h3&gt;Scenario 3&lt;/h3&gt;
&lt;p&gt;The goal is to enable or disable specific dnf module and also install or
remove a specific package.&lt;/p&gt;
&lt;p&gt;For this scenario:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Specify&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;module name&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;which version to disable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;which version to enable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;which specific package from the module to install (optional)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="scenario-4"&gt;
&lt;h3&gt;Scenario 4&lt;/h3&gt;
&lt;p&gt;The goal is to enable or disable some repos,
remove any associated repo files no longer needed,
and then perform a system update.&lt;/p&gt;
&lt;p&gt;For this scenario:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Specify&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;repo names to be disabled&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;repo names to be enabled&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the files to be removed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;whether to perform the system update&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;tripleo-docs will be updated to point to the new supported
repo/modules/versions setting workflow in tripleo-repos.&lt;/p&gt;
&lt;p&gt;References to old sources of settings such as tripleo-ansible,
release files in tripleo-quickstart and the repo-setup role
will have to be removed and replaced to point to the new
workflow.&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Mon, 25 Jan 2021 00:00:00 </pubDate></item><item><title>Ephemeral Heat Stack for all deployments</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/ephemeral-heat-overcloud.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/ephemeral-heat-overcloud"&gt;https://blueprints.launchpad.net/tripleo/+spec/ephemeral-heat-overcloud&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec proposes using the ephemeral Heat stack model for all deployments
types, including the overcloud.  Using ephemeral Heat is already done for
standalone deployments with the “tripleo deploy” command, and for the
undercloud install as well. Expanding its use to overcloud deployments will
align the different deployment methods into just a single method. It will also
make the installation process more stateless and with better predictability
since there is no Heat stack to get corrupted or possibly have bad state or
configuration.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Maintaining the Heat stack can be problematic due to corruption via either
user or software error. Backups are often not available, and even when they
exist, they are no guarantee to recover the stack. Corruption or loss of the
Heat stack, such as accidental deletion, requires custom recovery procedures
or re-deployments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Heat deployment itself must be maintained, updated, and upgraded. These
tasks are not large efforts, but they are areas of maintenance that would be
eliminated when using ephemeral Heat instead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Relying on the long lived Heat process makes the deployment less portable in
that there are many assumptions in TripleO that all commands are run
directly from the undercloud. Using ephemeral Heat would at least allow for
the stack operation and config-download generation to be entirely portable
such that it could be run from any node with python-tripleoclient installed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There are large unknowns in the state of each Heat stack that exists for all
current deployments. These unknowns can cause issues during update/upgrade as
we can’t possibly account for all of these items, such as out of date
parameter usage or old/incorrect resource registry mappings. Having each
stack operation create a new stack will eliminate those issues.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The ephemeral Heat stack model involves starting a short lived heat process
using a database engine for the purposes of creating the stack. The initial
proposal assumes using the MySQL instance already present on the undercloud as
the database engine. To maintain compatibility with the already implemented
“tripleo deploy” code path, SQLite will also be supported for single node
deployments.  SQLite may also be supported for other deployments of
sufficiently small size so as that SQLite is not a bottleneck.&lt;/p&gt;
&lt;p&gt;After the stack is created, the config-download workflow is run to download and
render the ansible project directory to complete the deployment. The short
lived heat process is killed and the database is deleted, however, enough
artifacts are saved to reproduce the Heat stack if necessary including the
database dump. The undercloud backup and restore procedure will be modified to
account for the removal of the Heat database.&lt;/p&gt;
&lt;p&gt;This model is already used by the “tripleo deploy” command for the standalone
and undercloud installations and is well proven for those use cases. Switching
the overcloud deployment to also use ephemeral Heat aligns all of the different
deployments to use Heat the same way.&lt;/p&gt;
&lt;p&gt;We can scale the ephemeral Heat processes by using a podman pod that
encapsulates containers for heat-api, heat-engine, and any other process we
needed. Running separate Heat processes containerized instead of a single
heat-all process will allow starting multiple engine workers to allow for
scale. Management and configuration of the heat pod will be fairly prescriptive
and it will use default podman networking as we do not need the Heat processes
to scale beyond a single host. Moving forward, undercloud minions will no
longer install heat-engine process as a means for scale.&lt;/p&gt;
&lt;p&gt;As part of this change, we will also add the ability to run Heat commands
against the saved database from a given deployment. This will give
operators a way to inspect the Heat stack that was created for debugging
purposes.&lt;/p&gt;
&lt;p&gt;Managing the templates used during the deployment becomes even more important
with this change, as the templates and environments passed to the “overcloud
deploy” command are the entire source of truth to recreate the deployment. We
may consider further management around the templates, such as a git repository
but that is outside the scope of this spec.&lt;/p&gt;
&lt;p&gt;There are some cases where the saved state in the stack is inspected before a
deployment operation. Two examples are comparing the Ceph fsid’s between the
input and what exists in the stack, as well as checking for a missing
network-isolation.yaml environment.&lt;/p&gt;
&lt;p&gt;In cases such as these, we need a way to perform these checks outside of
inspecting the Heat stack itself. A straightforward way to do these types of
checks would be to add ansible tasks that check the existing deployed overcloud
(instead of the stack) and then cause an error that will stop the deployment if
an invalid change is detected.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The alternative is to make no changes and continue to use Heat as we do today
for the overcloud deployment. With the work that has already been done to
decouple Heat from Nova, Ironic, and now Neutron, it instead seems like the
next iterative step is to use ephemeral Heat for all of our deployment types.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The short lived ephemeral heat process uses no authentication. This is in
contrast to the Heat process we have on the undercloud today that uses Keystone
for authentication. In reality, this change has little effect on security as
all of the sensitive data is actually passed into Heat from the templates. We
should however make sure that the generated artifacts are secured
appropriately.&lt;/p&gt;
&lt;p&gt;Since the Heat process is ephemeral, no change related to SRBAC (Secure RBAC)
is needed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;When users upgrade to Wallaby, the Heat processes will be shutdown on the
undercloud, and further stack operations will use ephemeral Heat.&lt;/p&gt;
&lt;p&gt;Upgrade operations for the overcloud will work as expected as all of the update
and upgrade tasks are entirely generated with config-download on each stack
operation. We will however need to ensure proper upgrade testing to be sure
that all services can be upgraded appropriately using ephemeral Heat.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;End users will no longer have a running instance of Heat to interact with or
run heat client commands against. However, we will add management around
starting an ephemeral Heat process with the previously used database for
debugging inspection purposes (stack resource list/show, etc).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The ephemeral Heat process is presently single threaded. Addressing this
limitation by using a podman pod for the Heat processes will allow the
deployment to scale to meet overcloud deployment needs, while keeping the
process ephemeral and easy to manage with just a few commands.&lt;/p&gt;
&lt;p&gt;Using the MySQL database instead of SQLite as the database engine should
alleviate any impact around the database being a bottleneck. After the
database is backed up after a deployment operation, it would be wiped from
MySQL so that no state is saved outside of the produced artifacts from the
deployment.&lt;/p&gt;
&lt;p&gt;Alternatively, we can finish the work started in &lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/ussuri/scaling-with-ansible-inventory.html"&gt;Scaling with the Ansible
inventory&lt;/a&gt;. That work will enable deploying the Heat stack with a count of 1
for each role. With that change, the Heat stack operation times will scale with
the number of roles in the deployment, and not the number of nodes, which will
allow for similar performance as currently exists. Even while using the
inventory to scale, we are still likely to have worse performance with a single
heat-all process than we do today. With just a few roles, using just heat-all
becomes a bottleneck.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Initially, deployers will have the option to enable using the ephemeral Heat
model for overcloud deployments, until it becomes the default.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers will need to be aware of the new commands that will be added to
enable inspecting the Heat stack for debugging purposes.&lt;/p&gt;
&lt;p&gt;In some cases, some service template updates may be required where there are
instances that those templates rely on saved state in the Heat stack.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;james-slagle&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;The plan is to start prototyping this effort and have the option in place to
use it for a default overcloud deployment in Wallaby. There may be additional
fine tunings that we can finish in the X release, with a plan to backport to
Wallaby. Ideally, we would like to make this the default behavior in Wallaby.
To the extent that is possible will be determined by the prototype work.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add management of Heat podman pod to tripleoclient&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add option to “overcloud deploy” to use ephemeral Heat&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use code from “tripleo deploy” for management of ephemeral Heat&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure artifacts from the deployment are saved in known locations and
reusable as needed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update undercloud backup/restore to account for changes related to Heat
database.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add commands to enable running Heat commands with a previously used
database&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify undercloud minion installer to no longer install heat-engine&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Switch some CI jobs over to use the optional ephemeral Heat&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Eventually make using ephemeral Heat the default in “overcloud deploy”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Align the functionality from “tripleo deploy” into the “overcloud deploy”
command and eventually deprecate “tripleo deploy”.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;This work depends on other ongoing work to decouple Heat from management of
other OpenStack API resources, particularly the composable networks v2 work.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Network Data v2 Blueprint - &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/network-data-v2-ports"&gt;https://blueprints.launchpad.net/tripleo/+spec/network-data-v2-ports&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Initially, the change will be optional within the “overcloud deploy” command.
We can choose some CI jobs to switch over to opt-in. Eventually, it will become
the default behavior and all CI jobs would then be affected.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation updates will be necessary to detail the changes around using
ephemeral Heat. Specifically:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;User Interface changes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How to run Heat commands to inspect the stack&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Where artifacts from the deployment were saved and how to use them&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/ussuri/scaling-with-ansible-inventory.html"&gt;Scaling with the Ansible inventory&lt;/a&gt; specification&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Tue, 01 Dec 2020 00:00:00 </pubDate></item><item><title>Network Data v2 - node ports and node network config</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/triplo-network-data-v2-node-ports.html</link><description>
 
&lt;p&gt;With “Network Data v2” the goal is to move management of network resources
out of the heat stack. The schema spec &lt;a class="footnote-reference brackets" href="#id5" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; talked about the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network_data.yaml&lt;/span&gt;&lt;/code&gt; format and managing networks, segments and subnets. This
spec follows up with node ports for composable networks and moving the node
network configuration action to the baremetal/network configuration workflow.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem description&lt;/h2&gt;
&lt;p&gt;Applying a network change on day 2, currently requires a full stack update
since network resources such as ports are managed by heat. It has also been
problematic to create ports for large scale deployments; neutron on the single
node undercloud gets overwhelmed and it is difficult to throttle port creation
in Heat. As an early indication on the performance of port creation with the
proposed ansible module:&lt;/p&gt;
&lt;p&gt;Performance stats: 100 nodes x 3 networks = 300 ports&lt;/p&gt;
&lt;div class="highlight-text notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;        4xCPU 1.8 GHz (8GB)             8x CPU 2.6 GHz (12GB)
        -------------------  --------------------------------
Concurr:                 10          20         10          4
........     ..............   .........  .........  .........
Create       real 5m58.006s   1m48.518s  1m51.998s  1m25.022s
Delete:      real 4m12.812s   0m47.475s  0m48.956s  1m19.543s
Re-run:      real 0m19.386s    0m4.389s   0m4.453s   0m4.977s
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;Extend the baremetal provisioning workflow that runs before overcloud
deployment to also create ports for composable networks. The baremetal
provisioning step already create ports for the provisioning network. Moving
the management of ports for composable networks to this workflow will
consolidate all port management into one workflow.&lt;/p&gt;
&lt;p&gt;Also make baremetal provisioning workflow execute the tripleo-ansible
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo_network_config&lt;/span&gt;&lt;/code&gt; role to configure node networking after
node provisioning.&lt;/p&gt;
&lt;p&gt;The deploy workflow would be:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Operator defines composable networks in network data YAML file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator provisions composable networks by running the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;network&lt;/span&gt; &lt;span class="pre"&gt;provision&lt;/span&gt;&lt;/code&gt; command, providing the network
data YAML file as input.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator defines roles and nodes in the baremetal deployment YAML file. This
YAML also defines the networks for each role.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator deploys baremetal nodes by running the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;node&lt;/span&gt; &lt;span class="pre"&gt;provision&lt;/span&gt;&lt;/code&gt; command. This step creates ports in
neutron, and also configures networking; including composable networks; on
the nodes using ansible role to apply network config with os-net-config
&lt;a class="footnote-reference brackets" href="#id6" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator deploys heat stack including the environment files produced by the
commands executed in the previous steps by running the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;deploy&lt;/span&gt;&lt;/code&gt; command.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator executes config-download to install and configure openstack on the
overcloud nodes. &lt;em&gt;(optional - only if overcloud deploy command executed with
``-stack-only``)&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Harald Jensås &amp;lt;&lt;a class="reference external" href="mailto:hjensas%40redhat.com"&gt;hjensas&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="approver-s"&gt;
&lt;h3&gt;Approver(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary approver:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;TODO&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="implementation-details"&gt;
&lt;h3&gt;Implementation Details&lt;/h3&gt;
&lt;p&gt;The baremetal YAML definition will be extended, adding the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;networks&lt;/span&gt;&lt;/code&gt; and the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network_config&lt;/span&gt;&lt;/code&gt; keys in role &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;defaults&lt;/span&gt;&lt;/code&gt; as well as per-instance to support
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fixed_ip&lt;/span&gt;&lt;/code&gt; addressing, manually pre-created port resource and per-node
network configuration template.&lt;/p&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;networks&lt;/span&gt;&lt;/code&gt; will replace the current &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nic&lt;/span&gt;&lt;/code&gt; key, until the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nic&lt;/span&gt;&lt;/code&gt; key is
deprecated either can be used but not both at the same time. Networks in
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;networks&lt;/span&gt;&lt;/code&gt; will support a boolean key &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vif&lt;/span&gt;&lt;/code&gt; which indicate if the port
should be attached in Ironic or not. If no network with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vif:&lt;/span&gt; &lt;span class="pre"&gt;true&lt;/span&gt;&lt;/code&gt; is
specified an implicit one for the control plane will be appended:&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;vif&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For networks with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vif:&lt;/span&gt; &lt;span class="pre"&gt;true&lt;/span&gt;&lt;/code&gt;, ports will be created by metalsmith. For
networks with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vif:&lt;/span&gt; &lt;span class="pre"&gt;false&lt;/span&gt;&lt;/code&gt; (or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vif&lt;/span&gt;&lt;/code&gt; not specified) the workflow will
create neutron ports based on the YAML definition.&lt;/p&gt;
&lt;p&gt;The neutron ports will initially be tagged with the &lt;em&gt;stack name&lt;/em&gt; and the
instance &lt;em&gt;hostname&lt;/em&gt;, these tags are used for idempotency. The ansible module
managing ports will get all ports with the relevant tags and then add/remove
ports based on the expanded roles defined in the Baremetal YAML definition.
(The &lt;em&gt;hostname&lt;/em&gt; and &lt;em&gt;stack_name&lt;/em&gt; tags are also added to ports created with heat
in this tripleo-heat-templates change &lt;a class="footnote-reference brackets" href="#id8" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, to enable &lt;em&gt;adoption&lt;/em&gt; of neutron
ports created by heat for the upgrade scenario.)&lt;/p&gt;
&lt;p&gt;Additionally the ports will be tagged with the ironic node uuid when this is
available. Full set of tags are shown in the example below.&lt;/p&gt;
&lt;div class="highlight-json notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"port"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"controller-1-External"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"tripleo_ironic_uuid=&amp;lt;IRONIC_NODE_UUID&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="s2"&gt;"tripleo_hostname=controller-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="s2"&gt;"tripleo_stack_name=overcloud"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;In deployments where baremetal nodes have multiple physical NIC’s
multiple networks can have &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vif:&lt;/span&gt; &lt;span class="pre"&gt;true&lt;/span&gt;&lt;/code&gt;, so that VIF attach
in ironic and proper neutron port binding happens. In a scenario
where neutron on the Undercloud is managing the switch this would
enable automation of the Top-of-Rack switch configuration.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Mapping of the port data for overcloud nodes will go into a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;NodePortMap&lt;/span&gt;&lt;/code&gt;
parameter in tripleo-heat-tempaltes. The map will contain submaps for each
node, keyed by the node name. Initially the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;NodePortMap&lt;/span&gt;&lt;/code&gt; will be consumed by
alternative &lt;em&gt;fake-port&lt;/em&gt;
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::{{role.name}}::Ports::{{network.name}}Port&lt;/span&gt;&lt;/code&gt; resource templates.
In the final implementation the environment file created can be extended and
the entire &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::{{role.name}}&lt;/span&gt;&lt;/code&gt; resource can be replaced with a
template that references parameter in the generated environment directly, i.e a
re-implemented &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;puppet/role.role.j2.yaml&lt;/span&gt;&lt;/code&gt; without the server and port
resources. The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;NodePortMap&lt;/span&gt;&lt;/code&gt; will be added to the
&lt;em&gt;overcloud-baremetal-deployed.yaml&lt;/em&gt; created by the workflow creating the
overcloud node port resources.&lt;/p&gt;
&lt;p&gt;Network ports for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vif:&lt;/span&gt; &lt;span class="pre"&gt;false&lt;/span&gt;&lt;/code&gt; networks, will be managed by a new ansible
module &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo_overcloud_network_ports&lt;/span&gt;&lt;/code&gt;, the input for this role will be a
list of instance definitions as generated by the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo_baremetal_expand_roles&lt;/span&gt;&lt;/code&gt; ansible module. The
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo_baremetal_expand_roles&lt;/span&gt;&lt;/code&gt; ansible module will be extended to add
network/subnet information from the baremetal deployment YAML definition.&lt;/p&gt;
&lt;p&gt;The baremetal provision workflow will be extended to write a ansible inventory,
we should try extend tripleo-ansible-inventory so that the baremetal
provisioning workflow can re-use existing code to create the inventory.
The inventory will be used to configure networking on the provisioned nodes
using the &lt;strong&gt;triple-ansible&lt;/strong&gt; &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo_network_config&lt;/span&gt;&lt;/code&gt; ansible role.&lt;/p&gt;
&lt;section id="already-deployed-servers"&gt;
&lt;h4&gt;Already Deployed Servers&lt;/h4&gt;
&lt;p&gt;The Baremetal YAML definition will be used to describe the &lt;strong&gt;pre-deployed&lt;/strong&gt;
servers baremetal deployment. In this scenario there is no Ironic node to
update, no ironic UUID to add to a port’s tags and no ironic node to attach
VIFs to.&lt;/p&gt;
&lt;p&gt;All ports, including the ctlplane port will be managed by the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo_overcloud_network_ports&lt;/span&gt;&lt;/code&gt; ansible module. The Baremetal YAML
definition for a deployment with pre-deployed servers will have to include an
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;instance&lt;/span&gt;&lt;/code&gt; entry for each pre-deployed server. This entry will have the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;managed&lt;/span&gt;&lt;/code&gt; key set to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;false&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;It should be possible for an already deployed server to have a management
address that is completely separate from the tripleo managed addreses. The
Baremetal YAML definition can be extended to carry a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;management_ip&lt;/span&gt;&lt;/code&gt; field
for this purpose. In the case no managment address is available the ctlplane
network entry for pre-deployed instances must have &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fixed_ip&lt;/span&gt;&lt;/code&gt; configured.&lt;/p&gt;
&lt;p&gt;The deployment workflow will &lt;em&gt;short circuit&lt;/em&gt; the baremetal provisioning of
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;managed:&lt;/span&gt; &lt;span class="pre"&gt;false&lt;/span&gt;&lt;/code&gt; instances. The Baremetal YAML definition can define a
mix of &lt;em&gt;already deployed server&lt;/em&gt; instances, and instances that should be
provisioned via metalsmith. See &lt;a class="reference internal" href="#baremetal-yaml-pre-provsioned"&gt;&lt;span class="std std-ref"&gt;Example: Baremetal YAML for Already Deployed Servers&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="yaml-examples"&gt;
&lt;h4&gt;YAML Examples&lt;/h4&gt;
&lt;section id="example-baremetal-yaml-definition-with-defaults-properties"&gt;
&lt;h5&gt;Example: Baremetal YAML definition with defaults properties&lt;/h5&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Controller&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hostname_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-%index%&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;network_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;templates/multiple_nics/multiple_nics.j2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;physical_bridge_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;br-ex&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;public_interface_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;nic1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;network_deployment_actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'CREATE'&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;net_config_data_lookup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;vif&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;external&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;external_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_mgmt&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_mgmt_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hostname_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-%index%&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;network_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;templates/multiple_nics/multiple_nics.j2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;physical_bridge_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;br-ex&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;public_interface_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;nic1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;network_deployment_actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'CREATE'&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;net_config_data_lookup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;vif&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="example-baremetal-yaml-definition-with-per-instance-overrides"&gt;
&lt;h5&gt;Example: Baremetal YAML definition with per-instance overrides&lt;/h5&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Controller&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hostname_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-%index%&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;network_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;templates/multiple_nics/multiple_nics.j2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;physical_bridge_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;br-ex&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;public_interface_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;nic1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;network_deployment_actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'CREATE'&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;net_config_data_lookup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;bond_interface_ovs_options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;vif&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;external&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;external_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_mgmt&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_mgmt_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-0&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;node00&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;vif&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;internal_api&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.21.11.100&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;node01&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;External&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-1-external&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;node02&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ComputeLeaf1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hostname_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-leaf1-%index%&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-leaf1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-leaf1-0&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;node03&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;network_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;templates/multiple_nics/multiple_nics_dpdk.j2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;physical_bridge_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;br-ex&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;public_interface_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;nic1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;network_deployment_actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'CREATE'&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;net_config_data_lookup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;num_dpdk_interface_rx_queues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;vif&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.21.12.105&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-leaf1-0-tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="example-baremetal-yaml-for-already-deployed-servers"&gt;
&lt;span id="baremetal-yaml-pre-provsioned"/&gt;&lt;h5&gt;Example: Baremetal YAML for Already Deployed Servers&lt;/h5&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Controller&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;3&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hostname_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-%index%&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;network_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;templates/multiple_nics/multiple_nics.j2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;external&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;external_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_mgmt&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_mgmt_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;managed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-0&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.10&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.11&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.12&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hostname_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-%index%&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;network_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;templates/multiple_nics/multiple_nics.j2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-0&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;managed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.100&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;managed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.101&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="example-nodenetworkdatamappings"&gt;
&lt;h5&gt;Example: NodeNetworkDataMappings&lt;/h5&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;NodePortMap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;controller-0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.9 (2001:DB8:24::9)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.9/24 (2001:DB8:24::9/64)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.9 ([2001:DB8:24::9])&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.9 (2001:DB8:18::9)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.9/24 (2001:DB8:18::9/64)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.9 ([2001:DB8:18::9])&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.0.9 (2001:DB8:19::9)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.0.9/24 (2001:DB8:19::9/64)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.0.9 ([2001:DB8:19::9])&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;compute-0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.15 (2001:DB8:24::15)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.15/24 (2001:DB8:24::15/64)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.15 ([2001:DB8:24::15])&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.15 (2001:DB8:18::1)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.15/24 (2001:DB8:18::1/64)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.15 ([2001:DB8:18::1])&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.0.15 (2001:DB8:19::15)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.0.15/24 (2001:DB8:19::15/64)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_address_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.0.15 ([2001:DB8:19::15])&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="example-ansible-inventory"&gt;
&lt;h5&gt;Example: Ansible inventory&lt;/h5&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;Controller&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;vars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;role_networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;External&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;InternalApi&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;role_networks_lower&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;External&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;external&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;InternalApi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;Tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;networks_all&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;External&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;InternalApi&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;neutron_physical_bridge_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;br-ex&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;neutron_public_interface_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;nic1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tripleo_network_config_os_net_config_mappings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;network_deployment_actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'CREATE'&lt;/span&gt;&lt;span class="p p-Indicator"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;'UPDATE'&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_subnet_cidr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_mtu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1500&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_dns_nameservers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;dns_search_domains&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_host_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_cidr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_host_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_mtu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1500&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_vlan_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;20&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant_cidr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant_api_gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.0.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant_host_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant_mtu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1500&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hosts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;controller-0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ansible_host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.9&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.24.9&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;internal_api_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.9&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;tenant_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.0.9&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;Compute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;vars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;role_networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;InternalApi&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;role_networks_lower&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;InternalApi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;internal_api&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;Tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;networks_all&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;External&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;InternalApi&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Tenant&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;neutron_physical_bridge_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;br-ex&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;neutron_public_interface_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;nic1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tripleo_network_config_os_net_config_mappings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;network_deployment_actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'CREATE'&lt;/span&gt;&lt;span class="p p-Indicator"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;'UPDATE'&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_subnet_cidr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_mtu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1500&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.25.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_dns_nameservers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;dns_search_domains&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_host_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_cidr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_host_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_mtu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1500&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;internal_api_vlan_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;20&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant_cidr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant_api_gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.1.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant_host_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;tenant_mtu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1500&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hosts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;compute-0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ansible_host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.25.15&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ctlplane_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.25.15&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;internal_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.15&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;tenant_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.19.1.15&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="todo"&gt;
&lt;h3&gt;TODO&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Constraint validation, for example &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;BondInterfaceOvsOptions&lt;/span&gt;&lt;/code&gt; uses
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;allowed_pattern:&lt;/span&gt; &lt;span class="pre"&gt;^((?!balance.tcp).)*$&lt;/span&gt;&lt;/code&gt; to ensure balance-tcp bond mode is
not used, as it is known to cause packet loss.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Write ansible inventory after baremetal provisioning&lt;/p&gt;
&lt;p&gt;Create an ansible inventory, similar to the inventory created by config-
download. The ansible inventory is required to apply network
configuration to the deployed nodes.&lt;/p&gt;
&lt;p&gt;We should try to extend tripleo-ansible-inventory so that the baremetal
provisioning workflow can re-use existing code to create the inventory.&lt;/p&gt;
&lt;p&gt;It is likely that it makes sense for the workflow to also run the
tripleo-ansible role tripleo_create_admin to create the &lt;em&gt;tripleo-admin&lt;/em&gt;
ansible user.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extend baremetal provisioning workflow to create neutron ports and
update the ironic node &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;extra&lt;/span&gt;&lt;/code&gt; field with the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo_networks&lt;/span&gt;&lt;/code&gt; map.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The baremetal provisioning workflow needs a &lt;em&gt;pre-deployed-server&lt;/em&gt; option
that cause it to not deploy baremetal nodes, only create network ports.
When this option is used the baremetal deployment YAML file will also
describe the already provisioned nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apply and validate network configuration using the &lt;strong&gt;triple-ansible&lt;/strong&gt;
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo_network_config&lt;/span&gt;&lt;/code&gt; ansible role. This step will be integrated in
the provisioning command.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Disable and remove management of composable network ports in
tripleo-heat-templates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Change the Undercloud and Standalone deploy to apply network configuration
prior to the creating the ephemeral heat stack using the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo_network_config&lt;/span&gt;&lt;/code&gt; ansible role.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Multinode OVB CI job’s with network-isolation will be updated to test the new
workflow.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h2&gt;Upgrade Impact&lt;/h2&gt;
&lt;p&gt;During upgrade switching to use network ports managed outside of the heat stack
the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;PortDeletionPolicy&lt;/span&gt;&lt;/code&gt; must be set to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;retain&lt;/span&gt;&lt;/code&gt; during the update/upgrade
&lt;em&gt;prepare&lt;/em&gt; step, so that the existing neutron ports (which will be adopted by
the pre-heat port management workflow) are not deleted when running the update/
upgrade &lt;em&gt;converge&lt;/em&gt; step.&lt;/p&gt;
&lt;p&gt;Moving node network configuration out of tripleo-heat-templates will require
manual (or scripted) migration of settings controlled by heat template
parameters to the input file used for baremetal/network provisioning. At least
the following parameters are affected:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;NeutronPhysicalBridge&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NeutronPublicInterface&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NetConfigDataLookup&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NetworkDeploymentActions&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Parameters that will be deprecated:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;NetworkConfigWithAnsible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{role.name}}NetworkConfigTemplate&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NetworkDeploymentActions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{role.name}}NetworkDeploymentActions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;BondInterfaceOvsOptions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NumDpdkInterfaceRxQueues&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{role.name}}LocalMtu&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NetConfigDataLookup&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DnsServers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DnsSearchDomains&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ControlPlaneSubnetCidr&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HypervisorNeutronPublicInterface&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HypervisorNeutronPhysicalBridge&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The environment files used to select one of the pre-defined nic config
templates will no longer work. The template to use must be set in the YAML
defining the baremetal/network deployment. This affect the following
environment files:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;environments/net-2-linux-bonds-with-vlans.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environments/net-bond-with-vlans.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environments/net-bond-with-vlans-no-external.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environments/net-dpdkbond-with-vlans.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environments/net-multiple-nics.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environments/net-multiple-nics-vlans.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environments/net-noop.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environments/net-single-nic-linux-bridge-with-vlans.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environments/net-single-nic-with-vlans.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environments/net-single-nic-with-vlans-no-external.j2.yaml&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The documentation effort is &lt;strong&gt;heavy&lt;/strong&gt; and will need to be incrementally
updated. As a minumum, a separate page explaining the new process must be
created.&lt;/p&gt;
&lt;p&gt;The TripleO docs will need updates in many sections, including:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/install_overcloud.html"&gt;TripleO OpenStack Deployment&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html#"&gt;Provisioning Baremetal Before Overcloud Deploy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/custom_networks.html"&gt;Deploying with Custom Networks&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/network_isolation.html"&gt;Configuring Network Isolation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html"&gt;Deploying Overcloud with L3 routed networking&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h2&gt;Alternatives&lt;/h2&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Not changing how ports are created&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In this case we keep creating the ports with heat, the do nothing
alternative.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create a completely separate workflow for composable network ports&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A separate workflow that can run before/after node provisioning. It can read
the same YAML format as baremetal provisioning, or it can have it’s own YAML
format.&lt;/p&gt;
&lt;p&gt;The problem with this approach is that we loose the possibility to store
relations between neutron-port and baremetal node in a database. As in, we’d
need our own database (a file) maintaining the relationships.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;We need to implement this workflow anyway for a pre-deployed
server scenario, but instead of a completely separate workflow
the baremetal deploy workflow can take an option to not
provision nodes.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create ports in ironic and bind neutron ports&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Instead of creating ports unknown to ironic, create ports for the ironic
nodes in the baremetal service.&lt;/p&gt;
&lt;p&gt;The issue is that ironic does not have a concept of virtual port’s, so we
would have to either add this support in ironic, switch TripleO to use
neutron trunk ports or create &lt;em&gt;fake&lt;/em&gt; ironic ports that don’t actually
reflect NICs on the baremetal node. (This abandoned ironic spec &lt;a class="footnote-reference brackets" href="#id7" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; discuss
one approach for virtual port support, but it was abandoned in favor of
neutron trunk ports.)&lt;/p&gt;
&lt;p&gt;With each PTG there is a re-occurring suggestion to replace neutron with a
more light weight IPAM solution. However, the effort to actually integrate
it properly with ironic and neutron for composable networks probably isn’t
time well spent.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id5" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/752437"&gt;Review: Spec for network data v2 format&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id6" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/os-net-config"&gt;os-net-config&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id7" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id4"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/277853"&gt;Abandoned spec for VLAN Aware Baremetal Instances&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id8" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/761845"&gt;Review: Add hostname and stack_name tags to ports&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Thu, 29 Oct 2020 00:00:00 </pubDate></item><item><title>TripleO Ceph Ganesha Integration for Manila</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ceph-ganesha.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph-ganesha"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph-ganesha&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Starting in the Octopus release, Ceph has its own day1 tool called cephadm and
its own day2 tool called orchestrator which will replace ceph-ansible.&lt;/p&gt;
&lt;p&gt;During the Wallaby cycle TripleO will no longer use ceph-ansible for Ceph
deployment and instead use cephadm &lt;a class="footnote-reference brackets" href="#id10" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; as described in &lt;a class="footnote-reference brackets" href="#id9" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. Ganesha deserves
special attention because for its deployment we will use special functionalities
in cephadm &lt;a class="footnote-reference brackets" href="#id10" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; meant to deploy the Ganesha service standalone when the Ceph
cluster is external.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;In TripleO we support deployment of Ganesha both when the Ceph cluster is itself
managed by TripleO and when the Ceph cluster is itself not managed by TripleO.&lt;/p&gt;
&lt;p&gt;When the Ceph cluster is &lt;em&gt;not&lt;/em&gt; managed by Tripleo, the Ganesha service must be
deployed standalone; that is, without any additional core Ceph daemon and it
should instead be configured to use the external Ceph MON and MDS daemons.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;An ansible task will trigger cephadm &lt;a class="footnote-reference brackets" href="#id10" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; with special arguments for it to stand
up a standalone Ganesha container and to it we will provide:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;the Ceph cluster config file, generated using tripleo-ceph-client &lt;a class="footnote-reference brackets" href="#id11" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; role&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the Ceph cluster keyring to interact with MDS&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the Ganesha config file with pointers to the Ceph config/keyring to use&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The container will then be controlled by pacemaker, as it is today and reusing
the same code which today manages the ceph-nfs systemd service created by
ceph-ansible.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Forking and reusing the existing ceph-ansible role for ceph-nfs has been
discussed but ultimately discarded as that would have moved ownership of the
Ganesha deployment tasks in TripleO, while our goal remaing to keep ownership
where subject expertise is, in the Ceph deployment tool.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None, the same code which TripleO would already use for the generation of the
Ceph cluster config and keyrings will be consumed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;Some upgrade tasks which stop and remove the pre-existing ceph-nfs container
and systemd unit will be added to clean up the system from the ceph-ansible
managed resources.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None, the existing input parameters will be reused to drive the newer deployment
tool.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;No changes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;No impact on users.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;The Ganesha config file will be generated using a specific tripleo-ceph task
while previously, with ceph-ansible, this was created by ceph-ansible itself.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;The existing implementation which depends on ceph-ansible will remain
in-tree for at least 1 deprecation cycle. By reusing the existing Heat
input parameters we should be able to transparently make the Ganesha
deployment happen with ceph-ansible or the new role just by switching
the environment file used at deployment time.&lt;/p&gt;
&lt;section id="deployment-flow"&gt;
&lt;h3&gt;Deployment Flow&lt;/h3&gt;
&lt;p&gt;The deployment and configuration described in this spec will
happen before &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt;, as described in
&lt;a class="footnote-reference brackets" href="#id9" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. This is consistent with how ceph-ansible used to run during
step2 to configure these services. However, parts of the Manila
configuration which use Ganesha will still happen when &lt;cite&gt;openstack
overcloud deploy&lt;/cite&gt; is run. This is because some of the configuration
for Ganesha and Manila needs to happen during step 5. Thus, files like
&lt;cite&gt;environments/manila-cephfsganesha-config.yaml&lt;/cite&gt; will be updated to
trigger the new required actions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;fmount&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;fultonj&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gfidente&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create a set of tasks to deploy on overcloud nodes the Ganesha config file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a set of tasks to trigger cephadm with special arguments&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The tripleo-ceph spec &lt;a class="footnote-reference brackets" href="#id9" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Testing is currently impossible as we only have one network while for Ganesha
we require at least two, one which connects it to the Ceph public network and
another where the NFS proxy service is exposed to tenants.&lt;/p&gt;
&lt;p&gt;This is a design decision, one of the values added by the use of an NFS proxy
for CephFS is to implement network isolation in between the tenant guests and
the actual Ceph cluster.&lt;/p&gt;
&lt;p&gt;Such a limitation does not come from the migration to cephadm &lt;a class="footnote-reference brackets" href="#id10" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; but it has
always existed; the code which enforces the use of two isolated networks is in
fact in TripleO, not in the Ceph tool itself. We might revisit this in the
future but it is not a goal of this spec to change this.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;No changes should be necessary to the TripleO documentation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id9" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id2"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id6"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id7"&gt;3&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ceph.html"&gt;tripleo-ceph&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id10" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id3"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id4"&gt;3&lt;/a&gt;,&lt;a role="doc-backlink" href="#id8"&gt;4&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/ceph/ceph/tree/master/src/cephadm"&gt;cephadm&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id11" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ceph-client.html"&gt;tripleo-ceph-client&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Fri, 23 Oct 2020 00:00:00 </pubDate></item><item><title>Modify TripleO Ironic Inspector to PXE Boot Via DHCP Relay</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/tripleo-routed-networks-ironic-inspector.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-ironic-inspector"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-ironic-inspector&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This blueprint is part of the series tripleo-routed-networks-deployment &lt;a class="footnote-reference brackets" href="#id10" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;0&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This spec describes adding features to the Undercloud to support Ironic
Inspector performing PXE boot services for multiple routed subnets (with
DHCP relay on the routers forwarding the requests). The changes required
to support this will be in the format of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;undercloud.conf&lt;/span&gt;&lt;/code&gt; and in the Puppet
script that writes the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq.conf&lt;/span&gt;&lt;/code&gt; configuration for Ironic Inspector.&lt;/p&gt;
&lt;p&gt;TripleO uses Ironic Inspector to perform baremetal inspection of overcloud
nodes prior to deployment. Today, the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq.conf&lt;/span&gt;&lt;/code&gt; that is used by Ironic
Inspector is generated by Puppet scripts that run when the Undercloud is
configured. A single subnet and IP allocation range is entered in
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;undercloud.conf&lt;/span&gt;&lt;/code&gt; in the parameter &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;inspection_iprange&lt;/span&gt;&lt;/code&gt;. This spec would
implement support for multiple subnets in one provisioning network.&lt;/p&gt;
&lt;section id="background-context"&gt;
&lt;h2&gt;Background Context&lt;/h2&gt;
&lt;p&gt;For a detailed description of the desired topology and problems being
addresssed, please reference the parent blueprint
triplo-routed-networks-deployment &lt;a class="footnote-reference brackets" href="#id10" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;0&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="problem-descriptions"&gt;
&lt;h2&gt;Problem Descriptions&lt;/h2&gt;
&lt;p&gt;Ironic Inspector DHCP doesn’t yet support DHCP relay. This makes it
difficult to do introspection when the hosts are not on the same L2 domain
as the controllers.  The dnsmasq process will actually function across a DHCP
relay, but the configuration must be edited by hand.&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas, or Approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Add support for DHCP scopes and support for DHCP relays.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use remote DHCP/PXE boot but provide L3 routes back to the introspection server&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Neutron DHCP agent to PXE boot nodes for introspection (the Neutron
dhcp-agent already supports multiple subnets, and can be modified to support
DHCP relay). Note that there has been discussion about moving to Neutron for
Ironic Introspection on this bug &lt;a class="footnote-reference brackets" href="#id13" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. This is currently infeasible due to
Neutron not being able to issue IPs for unknown MACs. The related patch has
been abandoned &lt;a class="footnote-reference brackets" href="#id15" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Solution Implementation&lt;/p&gt;
&lt;p&gt;The Ironic Inspector DHCP server uses dnsmasq, but only configures one subnet.
We need to modify the Ironic Inspector DHCP configuration so that we can
configure DHCP for multiple Neutron subnets and allocation pools. Then we
should be able to use DHCP relay to send DHCP requests to the Ironic
Inspector DHCP server. In the long term, we can likely leverage the Routed
Networks work being done in Neutron to represent the subnets and allocation
pools that would be used for the DHCP range sets below. This spec only covers
the minimum needed for TripleO, so the work can be achieved simply by modifying
the Undercloud Puppet scripts. The following has been tested and shown
to result in successful introspection across two subnets, one local and one
across a router configured with DHCP relay:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;Current&lt;/span&gt; &lt;span class="n"&gt;dnsmasq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="n"&gt;representing&lt;/span&gt; &lt;span class="n"&gt;one&lt;/span&gt; &lt;span class="n"&gt;network&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;which&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt;
&lt;span class="n"&gt;configured&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="s2"&gt;"inspection_iprange"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;undercloud&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;br&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ctlplane&lt;/span&gt;
  &lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;interfaces&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;172.21.0.100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.21.0.120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;sequential&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ipxe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;175&lt;/span&gt;
  &lt;span class="c1"&gt;# Client is running iPXE; move to next stage of chainloading&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;boot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;ipxe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8088&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;inspector&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ipxe&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;boot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;undionly&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kpxe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;localhost&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;localdomain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.1&lt;/span&gt;

&lt;span class="n"&gt;Multiple&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;subnet&lt;/span&gt; &lt;span class="n"&gt;dnsmasq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="n"&gt;representing&lt;/span&gt; &lt;span class="n"&gt;multiple&lt;/span&gt; &lt;span class="n"&gt;subnets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;br&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ctlplane&lt;/span&gt;
  &lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;interfaces&lt;/span&gt;
  &lt;span class="c1"&gt;# Ranges and options&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;172.21.0.100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.21.0.120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;leaf1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;255.255.255.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;leaf1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.254&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;leaf2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.19.0.100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.19.0.120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;255.255.255.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;leaf2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.19.0.254&lt;/span&gt;

  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;sequential&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ipxe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;175&lt;/span&gt;
  &lt;span class="c1"&gt;# Client is running iPXE; move to next stage of chainloading&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;boot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;ipxe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8088&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;inspector&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ipxe&lt;/span&gt;
  &lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;boot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;undionly&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kpxe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;localhost&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;localdomain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In the above configuration, a router is supplied for all subnets, including
the subnet to which the Undercloud is attached. Note that the router is not
required for nodes on the same subnet as the inspector host, but if it gets
automatically generated it won’t hurt anything.&lt;/p&gt;
&lt;p&gt;This file is created by the Puppet file located in &lt;a class="footnote-reference brackets" href="#id11" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. That is where the
changes will have to be made.&lt;/p&gt;
&lt;p&gt;As discussed above, using a remote DHCP/PXE server is a possibility only if we
have support in the top-of-rack switches, or if there is a system or VM
listening on the remote subnet to relay DHCP requests. This configuration of
dnsmasq will allow it to send DHCP offers to the DHCP relay, which forwards the
offer on to the requesting host. After the offer is accepted, the host can
communicate directly with the Undercloud, since it has already received the
proper gateway address for packets to be forwarded. It will send a DHCP request
directly based on the offer, and the DHCP ACK will be sent directly from the
Undercloud to the client. Downloading of the PXE images is then done via TFTP
and HTTP, not through the DHCP relay.&lt;/p&gt;
&lt;p&gt;An additional problem is that Ironic Inspector blacklists nodes that have
already been introspected using iptables rules blocking traffic from
particular MAC addresses. Since packets relayed via DHCP relay will come
from the MAC address of the router (not the original NIC that sent the packet),
we will need to blacklist MACs based on the contents of the relayed DHCP
packet. If possible, this blacklisting would be done using dnsmasq, which
would provide the ability to decode the DHCP Discover packets and act on the
contents. In order to do blacklisting directly with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq&lt;/span&gt;&lt;/code&gt; instead of
using iptables, we need to be able to influence the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq&lt;/span&gt;&lt;/code&gt; configuration
file.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;The proposed changes are discussed below.&lt;/p&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The Puppet modules will need to be refactored to output a multi-subnet
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq.conf&lt;/span&gt;&lt;/code&gt; from a list of subnets in undercloud.conf.&lt;/p&gt;
&lt;p&gt;The blacklisting functionality will need to be updated. Filtering by MAC
address won’t work for DHCP requests that are relayed by a router. In that
case, the source MAC address will be the router interface that sent the
relayed request. There are methods to blacklist MAC addresses within dnsmasq,
such as this configuration:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mac&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;blacklist&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="n"&gt;MAC&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ignore&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;blacklist&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Or this configuration:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="c1"&gt;# Never offer DHCP service to a machine whose Ethernet&lt;/span&gt;
&lt;span class="c1"&gt;# address is 11:22:33:44:55:66&lt;/span&gt;
&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;44&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;66&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;ignore&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The configuration could be placed into the main &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq.conf&lt;/span&gt;&lt;/code&gt; file, or into
a file in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/etc/dnsmasq.d/&lt;/span&gt;&lt;/code&gt;. Either way, dnsmasq will have to be restarted
in order to re-read the configuration files. This is due to a security feature
in dnsmasq to prevent foreign configuration being loaded as root. Since DHCP
has a built-in retry mechanism, the brief time it takes to restart dnsmasq
should not impact introspection, as long as we don’t restart dnsmasq too
many times in any 60-second period.&lt;/p&gt;
&lt;p&gt;It does not appear that the dnsmasq DBus interface can be used to set the
“dhcp-ignore” option for individual MAC addresses &lt;a class="footnote-reference brackets" href="#id14" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; &lt;a class="footnote-reference brackets" href="#id16" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;One alternative approach is to use DHCP servers to assign IP addresses on all
hosts on all interfaces. This would simplify configuration within the Heat
templates and environment files. Unfortunately, this was the original approach
of TripleO, and it was deemed insufficient by end-users, who wanted stability
of IP addresses, and didn’t want to have an external dependency on DHCP.&lt;/p&gt;
&lt;p&gt;Another approach which was considered was simply trunking all networks back
to the Undercloud, so that dnsmasq could respond to DHCP requests directly,
rather than requiring a DHCP relay. Unfortunately, this has already been
identified as being unacceptable by some large operators, who have network
architectures that make heavy use of L2 segregation via routers. This also
won’t work well in situations where there is geographical separation between
the VLANs, such as in split-site deployments.&lt;/p&gt;
&lt;p&gt;Another approach is to use the DHCP server functionality in the network switch
infrastructure in order to PXE boot systems, then assign static IP addresses
after the PXE boot is done via DHCP. This approach would require configuration
at the switch level that influenced where systems PXE boot, potentially opening
up a security hole that is not under the control of OpenStack. This approach
also doesn’t lend itself to automation that accounts for things like changes
to the PXE image that is being served to hosts.&lt;/p&gt;
&lt;p&gt;It is not necessary to use hardware routers to forward DHCP packets. There
are DHCP relay and DHCP proxy packages available for Linux. It is possible
to place a system or a VM on both the Provisioning network and the remote
network in order to forward DHCP requests. This might be one method for
implementing CI testing. Another method might trunk all remote provisioning
networks back to the Undercloud, with DHCP relay running on the Undercloud
forwarding to the local br-ctlplane.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;One of the major differences between spine-and-leaf and standard isolated
networking is that the various subnets are connected by routers, rather than
being completely isolated. This means that without proper ACLs on the routers,
private networks may be opened up to outside traffic.&lt;/p&gt;
&lt;p&gt;This should be addressed in the documentation, and it should be stressed that
ACLs should be in place to prevent unwanted network traffic. For instance, the
Internal API network is sensitive in that the database and message queue
services run on that network. It is supposed to be isolated from outside
connections. This can be achieved fairly easily if &lt;em&gt;supernets&lt;/em&gt; are used, so that
if all Internal API subnets are a part of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;172.19.0.0/16&lt;/span&gt;&lt;/code&gt; supernet, an
ACL rule will allow only traffic between Internal API IPs (this is a simplified
example that could be applied on all Internal API router VLAN interfaces
or as a global ACL):&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="n"&gt;traffic&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;
&lt;span class="n"&gt;deny&lt;/span&gt; &lt;span class="n"&gt;traffic&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In the case of Ironic Inspector, the TFTP server is a potential point of
vulnerability. TFTP is inherently unauthenticated and does not include an
access control model. The network(s) where Ironic Inspector is operating
should be secured from remote access.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Deploying with spine-and-leaf will require additional parameters to
provide the routing information and multiple subnets required. This will have
to be documented. Furthermore, the validation scripts may need to be updated
to ensure that the configuration is validated, and that there is proper
connectivity between overcloud hosts.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Much of the traffic that is today made over layer 2 will be traversing layer
3 routing borders in this design. That adds some minimal latency and overhead,
although in practice the difference may not be noticeable. One important
consideration is that the routers must not be too overcommitted on their
uplinks, and the routers must be monitored to ensure that they are not acting
as a bottleneck, especially if complex access control lists are used.&lt;/p&gt;
&lt;p&gt;The DHCP process is not likely to be affected, however delivery of system
images via TFTP may suffer a performance degredation. Since TFTP does not
deal well with packet loss, deployers will have to take care not to
oversaturate the links between routing switches.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;A spine-and-leaf deployment will be more difficult to troubleshoot than a
deployment that simply uses a set of VLANs. The deployer may need to have
more network expertise, or a dedicated network engineer may be needed to
troubleshoot in some cases.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Spine-and-leaf is not easily tested in virt environments. This should be
possible, but due to the complexity of setting up libvirt bridges and
routes, we may want to provide a simulation of spine-and-leaf for use in
virtual environments. This may involve building multiple libvirt bridges
and routing between them on the Undercloud, or it may involve using a
DHCP relay on the virt-host as well as routing on the virt-host to simulate
a full routing switch. A plan for development and testing will need to be
formed, since not every developer can be expected to have a routed
environment to work in. It may take some time to develop a routed virtual
environment, so initial work will be done on bare metal.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Dan Sneddon &amp;lt;&lt;a class="reference external" href="mailto:dsneddon%40redhat.com"&gt;dsneddon&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Final assignees to be determined.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="approver-s"&gt;
&lt;h3&gt;Approver(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary approver:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Emilien Macchi &amp;lt;&lt;a class="reference external" href="mailto:emacchi%40redhat.com"&gt;emacchi&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Modify Ironic Inspector &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq.conf&lt;/span&gt;&lt;/code&gt; generation to allow export of
multiple DHCP ranges. The patch enabling this has merged &lt;a class="footnote-reference brackets" href="#id17" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the Ironic Inspector blacklisting mechanism so that it supports DHCP
relay, since the DHCP requests forwarded by the router will have the source
MAC address of the router, not the node being deployed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the documentation in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-docs&lt;/span&gt;&lt;/code&gt; to cover the spine-and-leaf case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add an upstream CI job to test booting across subnets (although
hardware availability may make this a long-term goal).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;[*] Note that depending on the timeline for Neutron/Ironic integration, it might
make sense to implement support for multiple subnets via changes to the Puppet
modules which process &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;undercloud.conf&lt;/span&gt;&lt;/code&gt; first, then follow up with a patch
to integrate Neutron networks into Ironic Inspector later on.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation-details"&gt;
&lt;h3&gt;Implementation Details&lt;/h3&gt;
&lt;p&gt;Workflow for introspection and deployment:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Network Administrator configures all provisioning VLANs with IP address of
Undercloud server on the ctlplane network as DHCP relay or “helper-address”.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator configures IP address ranges and default gateways in
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;undercloud.conf&lt;/span&gt;&lt;/code&gt;. Each subnet will require its own IP address range.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator imports baremetal instackenv.json.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When introspection or deployment is run, the DHCP server receives the DHCP
request from the baremetal host via DHCP relay.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the node has not been introspected, reply with an IP address from the
introspection pool and the inspector PXE boot image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introspection is performed. LLDP collection &lt;a class="footnote-reference brackets" href="#id12" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; is performed to gather
information about attached network ports.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The node is blacklisted in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq.conf&lt;/span&gt;&lt;/code&gt; (or in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/etc/dnsmasq.d&lt;/span&gt;&lt;/code&gt;),
and dnsmasq is restarted.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On the next boot, if the MAC address is blacklisted and a port exists in
Neutron, then Neutron replies with the IP address from the Neutron port
and the overcloud-full deployment image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Heat templates are processed which generate os-net-config templates, and
os-net-config is run to assign static IPs from the correct subnets, as well
as routes to other subnets via the router gateway addresses.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When using spine-and-leaf, the DHCP server will need to provide an introspection
IP address on the appropriate subnet, depending on the information contained in
the DHCP relay packet that is forwarded by the segment router. dnsmasq will
automatically match the gateway address (GIADDR) of the router that forwarded
the request to the subnet where the DHCP request was received, and will respond
with an IP and gateway appropriate for that subnet.&lt;/p&gt;
&lt;p&gt;The above workflow for the DHCP server should allow for provisioning IPs on
multiple subnets.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;There will be a dependency on routing switches that perform DHCP relay service
for production spine-and-leaf deployments. Since we will not have routing
switches in our virtual testing environment, a DHCP proxy may be set up as
described in the testing section below.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;In order to properly test this framework, we will need to establish at least
one CI test that deploys spine-and-leaf. As discussed in this spec, it isn’t
necessary to have a full routed bare metal environment in order to test this
functionality, although there is some work required to get it working in virtual
environments such as OVB.&lt;/p&gt;
&lt;p&gt;For virtual testing, it is sufficient to trunk all VLANs back to the
Undercloud, then run DHCP proxy on the Undercloud to receive all the
requests and forward them to br-ctlplane, where dnsmasq listens. This
will provide a substitute for routers running DHCP relay.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The TripleO docs will need to be updated to include detailed instructions
for deploying in a spine-and-leaf environment, including the environment
setup. Covering specific vendor implementations of switch configurations
is outside this scope, but a specific overview of required configuration
options should be included, such as enabling DHCP relay (or “helper-address”
as it is also known) and setting the Undercloud as a server to receive
DHCP requests.&lt;/p&gt;
&lt;p&gt;The updates to TripleO docs will also have to include a detailed discussion
of choices to be made about IP addressing before a deployment. If supernets
are to be used for network isolation, then a good plan for IP addressing will
be required to ensure scalability in the future.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id10" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;0&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/225384/6/specs/mitaka/routed-networks.rst"&gt;Spec: Routed Networks for Neutron&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id11" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/puppet-ironic/blob/master/templates/inspector_dnsmasq_http.erb"&gt;Source Code: inspector_dnsmasq_http.erb&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id12" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id9"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/374381"&gt;Review: Add LLDP processing hook and new CLI commands&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id13" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://bugs.launchpad.net/ironic/+bug/1658964"&gt;Bug: [RFE] Implement neutron routed networks support in Ironic&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id14" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id6"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://en.wikibooks.org/wiki/Python_Programming/Dbus"&gt;Wikibooks: Python Programming: DBus&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id15" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id4"&gt;5&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/248931/"&gt;Review: Enhanced Network/Subnet DHCP Options&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id16" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;6&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.thekelleys.org.uk/dnsmasq/docs/DBus-interface"&gt;Documentation: DBus Interface for dnsmasq&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id17" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id8"&gt;7&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/436716/"&gt;Review: Multiple DHCP Subnets for Ironic Inspector&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Mon, 19 Oct 2020 00:00:00 </pubDate></item><item><title>TripleO Routed Networks Deployment (Spine-and-Leaf Clos)</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/queens/tripleo-routed-networks-deployment.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-deployment"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-deployment&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;TripleO uses shared L2 networks today, so each node is attached to the
provisioning network, and any other networks are also shared. This
significantly reduces the complexity required to deploy on bare metal,
since DHCP and PXE booting are simply done over a shared broadcast domain.
This also makes the network switch configuration easy, since there is only
a need to configure VLANs and ports, but no added complexity from dynamic
routing between all switches.&lt;/p&gt;
&lt;p&gt;This design has limitations, however, and becomes unwieldy beyond a certain
scale. As the number of nodes increases, the background traffic from Broadcast,
Unicast, and Multicast (BUM) traffic also increases. This design also requires
all top-of-rack switches to trunk the VLANs back to the core switches, which
centralizes the layer 3 gateway, usually on a single core switch. That creates
a bottleneck which is not present in Clos architecture.&lt;/p&gt;
&lt;p&gt;This spec serves as a detailed description of the overall problem set, and
applies to the master blueprint. The sub-blueprints for the various
implementation items also have their own associated spec.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Where possible, modern high-performance datacenter networks typically use
routed networking to increase scalability and reduce failure domains. Using
routed networks makes it possible to optimize a Clos (also known as
“spine-and-leaf”) architecture for scalability:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;=========.&lt;/span&gt;                        &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;=========.&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;spine&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;__&lt;/span&gt;                    &lt;span class="n"&gt;__&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;spine&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="s1"&gt;'==|\=====\_ \__________________/ _/=====/|=='&lt;/span&gt;
   &lt;span class="o"&gt;|&lt;/span&gt; \&lt;span class="n"&gt;_&lt;/span&gt;     \&lt;span class="n"&gt;___&lt;/span&gt;   &lt;span class="o"&gt;/&lt;/span&gt;       \  &lt;span class="n"&gt;___&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;     &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="o"&gt;^&lt;/span&gt;
   &lt;span class="o"&gt;|&lt;/span&gt;    \&lt;span class="n"&gt;___&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;    \ &lt;span class="n"&gt;_______&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;   \ &lt;span class="n"&gt;___&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;    &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="o"&gt;|--&lt;/span&gt; &lt;span class="n"&gt;Dynamic&lt;/span&gt; &lt;span class="n"&gt;routing&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BGP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OSPF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="o"&gt;/&lt;/span&gt;  \       &lt;span class="o"&gt;/&lt;/span&gt;       \      &lt;span class="o"&gt;/&lt;/span&gt;  \    &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="n"&gt;v&lt;/span&gt;   &lt;span class="n"&gt;EIGRP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;------.&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;------&lt;/span&gt;       &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;------.&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;------.&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;leaf&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|....|&lt;/span&gt;&lt;span class="n"&gt;leaf&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;leaf&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;|....|&lt;/span&gt;&lt;span class="n"&gt;leaf&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="o"&gt;========&lt;/span&gt; &lt;span class="n"&gt;Layer&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="n"&gt;boundary&lt;/span&gt;
&lt;span class="s1"&gt;'------'&lt;/span&gt;    &lt;span class="s1"&gt;'------'&lt;/span&gt;      &lt;span class="s1"&gt;'------'&lt;/span&gt;    &lt;span class="s1"&gt;'------'&lt;/span&gt;
   &lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="o"&gt;|&lt;/span&gt;             &lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="o"&gt;|&lt;/span&gt;             &lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="o"&gt;|-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;serv&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;A1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=-|&lt;/span&gt;             &lt;span class="o"&gt;|-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;serv&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;B1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=-|&lt;/span&gt;
   &lt;span class="o"&gt;|-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;serv&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;A2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=-|&lt;/span&gt;             &lt;span class="o"&gt;|-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;serv&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;B2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=-|&lt;/span&gt;
   &lt;span class="o"&gt;|-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;serv&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;A3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=-|&lt;/span&gt;             &lt;span class="o"&gt;|-&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;serv&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;B3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=-|&lt;/span&gt;
       &lt;span class="n"&gt;Rack&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;                     &lt;span class="n"&gt;Rack&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In the above diagram, each server is connected via an Ethernet bond to both
top-of-rack leaf switches, which are clustered and configured as a virtual
switch chassis. Each leaf switch is attached to each spine switch. Within each
rack, all servers share a layer 2 domain. The subnets are local to the rack,
and the default gateway is the top-of-rack virtual switch pair. Dynamic routing
between the leaf switches and the spine switches permits East-West traffic
between the racks.&lt;/p&gt;
&lt;p&gt;This is just one example of a routed network architecture. The layer 3 routing
could also be done only on the spine switches, or there may even be distribution
level switches that sit in between the top-of-rack switches and the routed core.
The distinguishing feature that we are trying to enable is segregating local
systems within a layer 2 domain, with routing between domains.&lt;/p&gt;
&lt;p&gt;In a shared layer-2 architecture, the spine switches typically have to act in an
active/passive mode to act as the L3 gateway for the single shared VLAN. All
leaf switches must be attached to the active switch, and the limit on North-South
bandwidth is the connection to the active switch, so there is an upper bound on
the scalability. The Clos topology is favored because it provides horizontal
scalability. Additional spine switches can be added to increase East-West and
North-South bandwidth. Equal-cost multipath routing between switches ensures
that all links are utlized simultaneously. If all ports are full on the spine
switches, an additional tier can be added to connect additional spines,
each with their own set of leaf switches, providing hyperscale expandability.&lt;/p&gt;
&lt;p&gt;Each network device may be taken out of service for maintenance without the entire
network being offline. This topology also allows the switches to be configured
without physical loops or Spanning Tree, since the redundant links are either
delivered via bonding or via multiple layer 3 uplink paths with equal metrics.
Some advantages of using this architecture with separate subnets per rack are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Reduced domain for broadcast, unknown unicast, and multicast (BUM) traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reduced failure domain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Geographical separation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Association between IP address and rack location.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better cross-vendor support for multipath forwarding using equal-cost
multipath forwarding (ECMP) via L3 routing, instead of proprietary “fabric”.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This topology is significantly different from the shared-everything approach that
TripleO takes today.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="problem-descriptions"&gt;
&lt;h2&gt;Problem Descriptions&lt;/h2&gt;
&lt;p&gt;As this is a complex topic, it will be easier to break the problems down into
their constituent parts, based on which part of TripleO they affect:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Problem #1: TripleO uses DHCP/PXE on the Undercloud provisioning net (ctlplane).&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Neutron on the undercloud does not yet support DHCP relays and multiple L2
subnets, since it does DHCP/PXE directly on the provisioning network.&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas, or Approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Modify Ironic and/or Neutron to support multiple DHCP ranges in the dnsmasq
configuration, use DHCP relay running on top-of-rack switches which
receives DHCP requests and forwards them to dnsmasq on the Undercloud.
There is a patch in progress to support that &lt;a class="footnote-reference brackets" href="#id29" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;11&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify Neutron to support DHCP relay. There is a patch in progress to
support that &lt;a class="footnote-reference brackets" href="#id28" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;10&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Currently, if one adds a subnet to a network, Neutron DHCP agent will pick up
the changes and configure separate subnets correctly in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq&lt;/span&gt;&lt;/code&gt;. For instance,
after adding a second subnet to the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ctlplane&lt;/span&gt;&lt;/code&gt; network, here is the resulting
startup command for Neutron’s instance of dnsmasq:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;dnsmasq&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;hosts&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;resolv&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;strict&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lo&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;neutron&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;aae53442&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;204e-4&lt;/span&gt;&lt;span class="n"&gt;c8e&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;a84&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="n"&gt;baaeb496cf&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;pid&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;hostsfile&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;neutron&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;aae53442&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;204e-4&lt;/span&gt;&lt;span class="n"&gt;c8e&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;a84&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="n"&gt;baaeb496cf&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;addn&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;hosts&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;neutron&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;aae53442&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;204e-4&lt;/span&gt;&lt;span class="n"&gt;c8e&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;a84&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="n"&gt;baaeb496cf&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;addn_hosts&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;optsfile&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;neutron&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;aae53442&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;204e-4&lt;/span&gt;&lt;span class="n"&gt;c8e&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;a84&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="n"&gt;baaeb496cf&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;leasefile&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;neutron&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;aae53442&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;204e-4&lt;/span&gt;&lt;span class="n"&gt;c8e&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;a84&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="n"&gt;baaeb496cf&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;leases&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;ipxe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;175&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;interfaces&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;interface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tap4ccef953&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e0&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tag0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;static&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;86400&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tag1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;static&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;86400&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;force&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;mtu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1500&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;dhcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;lease&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt; \
&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;dnsmasq&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ironic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;openstacklocal&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The router information gets put into the dhcp-optsfile, here are the contents
of /var/lib/neutron/dhcp/aae53442-204e-4c8e-8a84-55baaeb496cf/opts:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tag0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;classless&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;static&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0.0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.19.0.254&lt;/span&gt;
&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tag0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;249&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0.0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.19.0.254&lt;/span&gt;
&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tag0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.19.0.254&lt;/span&gt;
&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tag1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;classless&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;static&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;169.254.169.254&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0.0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.254&lt;/span&gt;
&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tag1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;249&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;169.254.169.254&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0.0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.0.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.254&lt;/span&gt;
&lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tag1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;172.20.0.254&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The above options file will result in separate routers being handed out to
separate IP subnets. Furthermore, Neutron appears to “do the right thing” with
regard to routes for other subnets on the same network. We can see that the
option “classless-static-route” is given, with pointers to both the default
route and the other subnet(s) on the same Neutron network.&lt;/p&gt;
&lt;p&gt;In order to modify Ironic-Inspector to use multiple subnets, we will need to
extend instack-undercloud to support network segments. There is a patch in
review to support segments in instack undercloud &lt;a class="footnote-reference brackets" href="#id20" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;0&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Potential Workaround&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;One possibility is to use an alternate method to DHCP/PXE boot, such as using
DHCP configuration directly on the router, or to configure a host on the remote
network which provides DHCP and PXE URLs, then provides routes back to the
ironic-conductor and metadata server as part of the DHCP response.&lt;/p&gt;
&lt;p&gt;It is not always feasible for groups doing testing or development to configure
DHCP relay on the switches. For proof-of-concept implementations of
spine-and-leaf, we may want to configure all provisioning networks to be
trunked back to the Undercloud. This would allow the Undercloud to provide DHCP
for all networks without special switch configuration. In this case, the
Undercloud would act as a router between subnets/VLANs. This should be
considered a small-scale solution, as this is not as scalable as DHCP relay.
The configuration file for dnsmasq is the same whether all subnets are local or
remote, but dnsmasq may have to listen on multiple interfaces (today it only
listens on br-ctlplane). The dnsmasq process currently runs with
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--bind-interface=tap-XXX&lt;/span&gt;&lt;/code&gt;, but the process will need to be run with either
binding to multiple interfaces, or with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--except-interface=lo&lt;/span&gt;&lt;/code&gt; and multiple
interfaces bound to the namespace.&lt;/p&gt;
&lt;p&gt;For proof-of-concept deployments, as well as testing environments, it might
make sense to run a DHCP relay on the Undercloud, and trunk all provisioning
VLANs back to the Undercloud. This would allow dnsmasq to listen on the tap
interface, and DHCP requests would be forwarded to the tap interface. The
downside of this approach is that the Undercloud would need to have IP
addresses on each of the trunked interfaces.&lt;/p&gt;
&lt;p&gt;Another option is to configure dedicated hosts or VMs to be used as DHCP relay
and router for subnets on multiple VLANs, all of which would be trunked to the
relay/router host, thus acting exactly like routing switches.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;&lt;strong&gt;Problem #2: Neutron’s model for a segmented network that spans multiple L2
domains uses the segment object to allow multiple subnets to be assigned to
the same network. This functionality needs to be integrated into the
Undercloud.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas, or Approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Implement Neutron segments on the undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The spec for Neutron routed network segments &lt;a class="footnote-reference brackets" href="#id21" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; provides a schema that we can
use to model a routed network. By implementing support for network segments, we
can provide assign Ironic nodes to networks on routed subnets. This allows us
to continue to use Neutron for IP address management, as ports are assigned by
Neutron and tracked in the Neutron database on the Undercloud. See approach #1
below.&lt;/p&gt;
&lt;ol class="arabic simple" start="2"&gt;
&lt;li&gt;&lt;p&gt;Multiple Neutron networks (1 set per rack), to model all L2 segments.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By using a different set of networks in each rack, this provides us with
the flexibility to use different network architectures on a per-rack basis.
Each rack could have its own set of networks, and we would no longer have
to provide all networks in all racks. Additionally, a split-datacenter
architecture would naturally have a different set of networks in each
site, so this approach makes sense. This is detailed in approach #2 below.&lt;/p&gt;
&lt;ol class="arabic simple" start="3"&gt;
&lt;li&gt;&lt;p&gt;Multiple subnets per Neutron network.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is probably the best approach for provisioning, since Neutron is
already able to handle DHCP relay with multiple subnets as part of the
same network. Additionally, this allows a clean separation between local
subnets associated with provisioning, and networks which are used
in the overcloud, such as External networks in two different datacenters).
This is covered in more detail in approach #3 below.&lt;/p&gt;
&lt;ol class="arabic simple" start="4"&gt;
&lt;li&gt;&lt;p&gt;Use another system for IPAM, instead of Neutron.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Although we could use a database, flat file, or some other method to keep
track of IP addresses, Neutron as an IPAM back-end provides many integration
benefits. Neutron integrates DHCP, hardware switch port configuration (through
the use of plugins), integration in Ironic, and other features such as
IPv6 support. This has been deemed to be infeasible due to the level of effort
required in replacing both Neutron and the Neutron DHCP server (dnsmasq).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Approaches to Problem #2:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Approach 1 (Implement Neutron segments on the Undercloud):&lt;/p&gt;
&lt;p&gt;The Neutron segments model provides a schema in Neutron that allows us to
model the routed network. Using multiple subnets provides the flexibility
we need without creating exponentially more resources. We would create the same
provisioning network that we do today, but use multiple segments associated
to different routed subnets. The disadvantage to this approach is that it makes
it impossible to represent network VLANs with more than one IP subnet (Neutron
technically supports more than one subnet per port). Currently TripleO only
supports a single subnet per isolated network, so this should not be an issue.&lt;/p&gt;
&lt;p&gt;Approach 2 (Multiple Neutron networks (1 set per rack), to model all L2 segments):&lt;/p&gt;
&lt;p&gt;We will be using multiple networks to represent isolated networks in multiple
L2 domains. One sticking point is that although Neutron will configure multiple
routes for multiple subnets within a given network, we need to be able to both
configure static IPs and routes, and be able to scale the network by adding
additional subnets after initial deployment.&lt;/p&gt;
&lt;p&gt;Since we control addresses and routes on the host nodes using a
combination of Heat templates and os-net-config, it is possible to use
static routes to supernets to provide L2 adjacency. This approach only
works for non-provisioning networks, since we rely on Neutron DHCP servers
providing routes to adjacent subnets for the provisioning network.&lt;/p&gt;
&lt;p&gt;Example:
Suppose 2 subnets are provided for the Internal API network: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;172.19.1.0/24&lt;/span&gt;&lt;/code&gt;
and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;172.19.2.0/24&lt;/span&gt;&lt;/code&gt;. We want all Internal API traffic to traverse the Internal
API VLANs on both the controller and a remote compute node. The Internal API
network uses different VLANs for the two nodes, so we need the routes on the
hosts to point toward the Internal API gateway instead of the default gateway.
This can be provided by a supernet route to 172.19.x.x pointing to the local
gateway on each subnet (e.g. 172.19.1.1 and 172.19.2.1 on the respective
subnets). This could be represented in os-net-config with the following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="o"&gt;-&lt;/span&gt;
  &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;nic3&lt;/span&gt;
  &lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt;
      &lt;span class="n"&gt;ip_netmask&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InternalApiIpSubnet&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt;
      &lt;span class="n"&gt;ip_netmask&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InternalApiSupernet&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;next_hop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InternalApiRouter&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Where InternalApiIpSubnet is the IP address on the local subnet,
InternalApiSupernet is ‘172.19.0.0/16’, and InternalApiRouter is either
172.19.1.1 or 172.19.2.1 depending on which local subnet the host belongs to.&lt;/p&gt;
&lt;p&gt;The end result of this is that each host has a set of IP addresses and routes
that isolate traffic by function. In order for the return traffic to also be
isolated by function, similar routes must exist on both hosts, pointing to the
local gateway on the local subnet for the larger supernet that contains all
Internal API subnets.&lt;/p&gt;
&lt;p&gt;The downside of this is that we must require proper supernetting, and this may
lead to larger blocks of IP addresses being used to provide ample space for
scaling growth. For instance, in the example above an entire /16 network is set
aside for up to 255 local subnets for the Internal API network. This could be
changed into a more reasonable space, such as /18, if the number of local
subnets will not exceed 64, etc. This will be less of an issue with native IPv6
than with IPv4, where scarcity is much more likely.&lt;/p&gt;
&lt;p&gt;Approach 3 (Multiple subnets per Neutron network):&lt;/p&gt;
&lt;p&gt;The approach we will use for the provisioning network will be to use multiple
subnets per network, using Neutron segments. This will allow us to take
advantage of Neutron’s ability to support multiple networks with DHCP relay.
The DHCP server will supply the necessary routes via DHCP until the nodes are
configured with a static IP post-deployment.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;&lt;strong&gt;Problem #3: Ironic introspection DHCP doesn’t yet support DHCP relay&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This makes it difficult to do introspection when the hosts are not on the same L2
domain as the controllers. Patches are either merged or in review to support
DHCP relay.&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas, or Approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;A patch to support a dnsmasq PXE filter driver has been merged. This will
allow us to support selective DHCP when using DHCP relay (where the packet
is not coming from the MAC of the host but rather the MAC of the switch)
&lt;a class="footnote-reference brackets" href="#id30" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;12&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A patch has been merged to puppet-ironic to support multiple DHCP subnets
for Ironic Inspector &lt;a class="footnote-reference brackets" href="#id31" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;13&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A patch is in review to add support for multiple subnets for the
provisioning network in the instack-undercloud scripts &lt;a class="footnote-reference brackets" href="#id32" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;14&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For more information about solutions, please refer to the
tripleo-routed-networks-ironic-inspector blueprint &lt;a class="footnote-reference brackets" href="#id23" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and spec &lt;a class="footnote-reference brackets" href="#id24" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;&lt;strong&gt;Problem #4: The IP addresses on the provisioning network need to be
static IPs for production.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas, or Approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Dan Prince wrote a patch &lt;a class="footnote-reference brackets" href="#id27" id="id10" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; in Newton to convert the ctlplane network
addresses to static addresses post-deployment. This will need to be
refactored to support multiple provisioning subnets across routers.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Solution Implementation&lt;/p&gt;
&lt;p&gt;This work is done and merged for the legacy use case. During the
initial deployment, the nodes receive their IP address via DHCP, but during
Heat deployment the os-net-config script is called, which writes static
configuration files for the NICs with static IPs.&lt;/p&gt;
&lt;p&gt;This work will need to be refactored to support assigning IPs from the
appropriate subnet, but the work will be part of the TripleO Heat Template
refactoring listed in Problems #6, and #7 below.&lt;/p&gt;
&lt;p&gt;For the deployment model where the IPs are specified (ips-from-pool-all.yaml),
we need to develop a model where the Control Plane IP can be specified
on multiple deployment subnets. This may happen in a later cycle than the
initial work being done to enable routed networks in TripleO. For more
information, reference the tripleo-predictable-ctlplane-ips blueprint &lt;a class="footnote-reference brackets" href="#id25" id="id11" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;
and spec &lt;a class="footnote-reference brackets" href="#id26" id="id12" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;&lt;strong&gt;Problem #5: Heat Support For Routed Networks&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The Neutron routed networks extensions were only added in recent releases, and
there was a dependency on TripleO Heat Templates.&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas or Approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Add the required objects to Heat. At minimum, we will probably have to
add &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::Neutron::Segment&lt;/span&gt;&lt;/code&gt;, which represents layer 2 segments, the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::Neutron::Network&lt;/span&gt;&lt;/code&gt; will be updated to support the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;l2-adjacency&lt;/span&gt;&lt;/code&gt;
attribute, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::Neutron::Subnet&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::Neutron:port&lt;/span&gt;&lt;/code&gt; would be extended
to support the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;segment_id&lt;/span&gt;&lt;/code&gt; attribute.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Solution Implementation:&lt;/p&gt;
&lt;p&gt;Heat now supports the OS::Neutron::Segment resource. For example:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;heat_template_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2015&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;04&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="n"&gt;the_resource&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Neutron&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Segment&lt;/span&gt;
    &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
      &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
      &lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
      &lt;span class="n"&gt;network_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
      &lt;span class="n"&gt;physical_network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
      &lt;span class="n"&gt;segmentation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This work has been completed in Heat with this review &lt;a class="footnote-reference brackets" href="#id33" id="id13" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;15&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;&lt;strong&gt;Problem #6: Static IP assignment: Choosing static IPs from the correct
subnet&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some roles, such as Compute, can likely be placed in any subnet, but we will
need to keep certain roles co-located within the same set of L2 domains. For
instance, whatever role is providing Neutron services will need all controllers
in the same L2 domain for VRRP to work properly.&lt;/p&gt;
&lt;p&gt;The network interfaces will be configured using templates that create
configuration files for os-net-config. The IP addresses that are written to each
node’s configuration will need to be on the correct subnet for each host. In
order for Heat to assign ports from the correct subnets, we will need to have a
host-to-subnets mapping.&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas or Approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The simplest implementation of this would probably be a mapping of role/index
to a set of subnets, so that it is known to Heat that Controller-1 is in
subnet set X and Compute-3 is in subnet set Y.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We could associate particular subnets with roles, and then use one role
per L2 domain (such as per-rack).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The roles and templates should be refactored to allow for dynamic IP
assignment within subnets associated with the role. We may wish to evaluate
the possibility of storing the routed subnets in Neutron using the routed
networks extensions that are still under development. This would provide
additional flexibility, but is probably not required to implement separate
subnets in each rack.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A scalable long-term solution is to map which subnet the host is on
during introspection. If we can identify the correct subnet for each
interface, then we can correlate that with IP addresses from the correct
allocation pool.  This would have the advantage of not requiring a static
mapping of role to node to subnet. In order to do this, additional
integration would be required between Ironic and Neutron (to make Ironic
aware of multiple subnets per network, and to add the ability to make
that association during introspection).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Solution Impelementation:&lt;/p&gt;
&lt;p&gt;Solutions 1 and 2 above have been implemented in the “composable roles” series
of patches &lt;a class="footnote-reference brackets" href="#id34" id="id14" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;16&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. The initial implementation uses separate Neutron networks
for different L2 domains. These templates are responsible for assigning the
isolated VLANs used for data plane and overcloud control planes, but does not
address the provisioning network. Future work may refactor the non-provisioning
networks to use segments, but for now non-provisioning networks must use
different networks for different roles.&lt;/p&gt;
&lt;p&gt;Ironic autodiscovery may allow us to determine the subnet where each node
is located without manual entry. More work is required to automate this
process.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;&lt;strong&gt;Problem #7: Isolated Networking Requires Static Routes to Ensure Correct VLAN
is Used&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In order to continue using the Isolated Networks model, routes will need to be
in place on each node, to steer traffic to the correct VLAN interfaces. The
routes are written when os-net-config first runs, but may change. We
can’t just rely on the specific routes to other subnets, since the number of
subnets will increase or decrease as racks are added or taken away. Rather than
try to deal with constantly changing routes, we should use static routes that
will not need to change, to avoid disruption on a running system.&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas or Approaches:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Require that supernets are used for various network groups. For instance,
all the Internal API subnets would be part of a supernet, for instance
172.17.0.0/16 could be used, and broken up into many smaller subnets, such
as /24. This would simplify the routes, since only a single route for
172.17.0.0/16 would be required pointing to the local router on the
172.17.x.0/24 network.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify os-net-config so that routes can be updated without bouncing
interfaces, and then run os-net-config on all nodes when scaling occurs.
A review for this functionality was considered and abandeded &lt;a class="footnote-reference brackets" href="#id22" id="id15" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
The patch was determined to have the potential to lead to instability.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;os-net-config configures static routes for each interface. If we can keep the
routing simple (one route per functional network), then we would be able to
isolate traffic onto functional VLANs like we do today.&lt;/p&gt;
&lt;p&gt;It would be a change to the existing workflow to have os-net-config run on
updates as well as deployment, but if this were a non-impacting event (the
interfaces didn’t have to be bounced), that would probably be OK.&lt;/p&gt;
&lt;p&gt;At a later time, the possibility of using dynamic routing should be considered,
since it reduces the possibility of user error and is better suited to
centralized management. SDN solutions are one way to provide this, or other
approaches may be considered, such as setting up OVS tunnels.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;The proposed changes are discussed below.&lt;/p&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;In order to provide spine-and-leaf networking for deployments, several changes
will have to be made to TripleO:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Support for DHCP relay in Ironic and Neutron DHCP servers. Implemented in
patch &lt;a class="footnote-reference brackets" href="#id33" id="id16" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;15&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and the patch series &lt;a class="footnote-reference brackets" href="#id35" id="id17" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;17&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Refactoring of TripleO Heat Templates network isolation to support multiple
subnets per isolated network, as well as per-subnet and supernet routes.
The bulk of this work is done in the patch series &lt;a class="footnote-reference brackets" href="#id34" id="id18" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;16&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and in patch &lt;a class="footnote-reference brackets" href="#id28" id="id19" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;10&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Changes to Infra CI to support testing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation updates.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The approach outlined here is very prescriptive, in that the networks must be
known ahead of time, and the IP addresses must be selected from the appropriate
pool. This is due to the reliance on static IP addresses provided by Heat.&lt;/p&gt;
&lt;p&gt;One alternative approach is to use DHCP servers to assign IP addresses on all
hosts on all interfaces. This would simplify configuration within the Heat
templates and environment files. Unfortunately, this was the original approach
of TripleO, and it was deemed insufficient by end-users, who wanted stability
of IP addresses, and didn’t want to have an external dependency on DHCP.&lt;/p&gt;
&lt;p&gt;Another approach is to use the DHCP server functionality in the network switch
infrastructure in order to PXE boot systems, then assign static IP addresses
after the PXE boot is done via DHCP. This approach only solves for part of the
requirement: the net booting. It does not solve the desire to have static IP
addresses on each network. This could be achieved by having static IP addresses
in some sort of per-node map. However, this approach is not as scalable as
programatically determining the IPs, since it only applies to a fixed number of
hosts. We want to retain the ability of using Neutron as an IP address
management (IPAM) back-end, ideally.&lt;/p&gt;
&lt;p&gt;Another approach which was considered was simply trunking all networks back
to the Undercloud, so that dnsmasq could respond to DHCP requests directly,
rather than requiring a DHCP relay. Unfortunately, this has already been
identified as being unacceptable by some large operators, who have network
architectures that make heavy use of L2 segregation via routers. This also
won’t work well in situations where there is geographical separation between
the VLANs, such as in split-site deployments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;One of the major differences between spine-and-leaf and standard isolated
networking is that the various subnets are connected by routers, rather than
being completely isolated. This means that without proper ACLs on the routers,
networks which should be private may be opened up to outside traffic.&lt;/p&gt;
&lt;p&gt;This should be addressed in the documentation, and it should be stressed that
ACLs should be in place to prevent unwanted network traffic. For instance, the
Internal API network is sensitive in that the database and message queue
services run on that network. It is supposed to be isolated from outside
connections. This can be achieved fairly easily if &lt;em&gt;supernets&lt;/em&gt; are used, so
that if all Internal API subnets are a part of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;172.19.0.0/16&lt;/span&gt;&lt;/code&gt; supernet,
an ACL rule will allow only traffic between Internal API IPs (this is a
simplified example that could be applied to any Internal API VLAN, or as a
global ACL):&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="n"&gt;traffic&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;
&lt;span class="n"&gt;deny&lt;/span&gt; &lt;span class="n"&gt;traffic&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Deploying with spine-and-leaf will require additional parameters to
provide the routing information and multiple subnets required. This will have
to be documented. Furthermore, the validation scripts may need to be updated
to ensure that the configuration is validated, and that there is proper
connectivity between overcloud hosts.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Much of the traffic that is today made over layer 2 will be traversing layer
3 routing borders in this design. That adds some minimal latency and overhead,
although in practice the difference may not be noticeable. One important
consideration is that the routers must not be too overcommitted on their
uplinks, and the routers must be monitored to ensure that they are not acting
as a bottleneck, especially if complex access control lists are used.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;A spine-and-leaf deployment will be more difficult to troubleshoot than a
deployment that simply uses a set of VLANs. The deployer may need to have
more network expertise, or a dedicated network engineer may be needed to
troubleshoot in some cases.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Spine-and-leaf is not easily tested in virt environments. This should be
possible, but due to the complexity of setting up libvirt bridges and
routes, we may want to provide a simulation of spine-and-leaf for use in
virtual environments. This may involve building multiple libvirt bridges
and routing between them on the Undercloud, or it may involve using a
DHCP relay on the virt-host as well as routing on the virt-host to simulate
a full routing switch. A plan for development and testing will need to be
developed, since not every developer can be expected to have a routed
environment to work in. It may take some time to develop a routed virtual
environment, so initial work will be done on bare metal.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Dan Sneddon &amp;lt;&lt;a class="reference external" href="mailto:dsneddon%40redhat.com"&gt;dsneddon&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="approver-s"&gt;
&lt;h3&gt;Approver(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary approver:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Emilien Macchi &amp;lt;&lt;a class="reference external" href="mailto:emacchi%40redhat.com"&gt;emacchi&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Add static IP assignment to Control Plane [DONE]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify Ironic Inspector &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dnsmasq.conf&lt;/span&gt;&lt;/code&gt; generation to allow export of
multiple DHCP ranges, as described in Problem #1 and Problem #3.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Evaluate the Routed Networks work in Neutron, to determine if it is required
for spine-and-leaf, as described in Problem #2.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add OS::Neutron::Segment and l2-adjacency support to Heat, as described
in Problem #5. This may or may not be a dependency for spine-and-leaf, based
on the results of work item #3.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the Ironic-Inspector service to record the host-to-subnet mappings,
perhaps during introspection, to address Problem #6.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add parameters to Isolated Networking model in Heat to support supernet
routes for individual subnets, as described in Problem #7.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify Isolated Networking model in Heat to support multiple subnets, as
described in Problem #8.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add support for setting routes to supernets in os-net-config NIC templates,
as described in the proposed solution to Problem #2.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement support for iptables on the Controller, in order to mitigate
the APIs potentially being reachable via remote routes. Alternatively,
document the mitigation procedure using ACLs on the routers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document the testing procedures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the documentation in tripleo-docs to cover the spine-and-leaf case.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="implementation-details"&gt;
&lt;h3&gt;Implementation Details&lt;/h3&gt;
&lt;p&gt;Workflow:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Operator configures DHCP networks and IP address ranges&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator imports baremetal instackenv.json&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When introspection or deployment is run, the DHCP server receives the DHCP
request from the baremetal host via DHCP relay&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the node has not been introspected, reply with an IP address from the
introspection pool* and the inspector PXE boot image&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the node already has been introspected, then the server assumes this is
a deployment attempt, and replies with the Neutron port IP address and the
overcloud-full deployment image&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Heat templates are processed which generate os-net-config templates, and
os-net-config is run to assign static IPs from the correct subnets, as well
as routes to other subnets via the router gateway addresses.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The introspection pool will be different for each provisioning subnet.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When using spine-and-leaf, the DHCP server will need to provide an introspection
IP address on the appropriate subnet, depending on the information contained in
the DHCP relay packet that is forwarded by the segment router. dnsmasq will
automatically match the gateway address (GIADDR) of the router that forwarded
the request to the subnet where the DHCP request was received, and will respond
with an IP and gateway appropriate for that subnet.&lt;/p&gt;
&lt;p&gt;The above workflow for the DHCP server should allow for provisioning IPs on
multiple subnets.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;There may be a dependency on the Neutron Routed Networks. This won’t be clear
until a full evaluation is done on whether we can represent spine-and-leaf
using only multiple subnets per network.&lt;/p&gt;
&lt;p&gt;There will be a dependency on routing switches that perform DHCP relay service
for production spine-and-leaf deployments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;In order to properly test this framework, we will need to establish at least
one CI test that deploys spine-and-leaf. As discussed in this spec, it isn’t
necessary to have a full routed bare metal environment in order to test this
functionality, although there is some work to get it working in virtual
environments such as OVB.&lt;/p&gt;
&lt;p&gt;For bare metal testing, it is sufficient to trunk all VLANs back to the
Undercloud, then run DHCP proxy on the Undercloud to receive all the
requests and forward them to br-ctlplane, where dnsmasq listens. This
will provide a substitute for routers running DHCP relay. For Neutron
DHCP, some modifications to the iptables rule may be required to ensure
that all DHCP requests from the overcloud nodes are received by the
DHCP proxy and/or the Neutron dnsmasq process running in the dhcp-agent
namespace.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The procedure for setting up a dev environment will need to be documented,
and a work item mentions this requirement.&lt;/p&gt;
&lt;p&gt;The TripleO docs will need to be updated to include detailed instructions
for deploying in a spine-and-leaf environment, including the environment
setup. Covering specific vendor implementations of switch configurations
is outside this scope, but a specific overview of required configuration
options should be included, such as enabling DHCP relay (or “helper-address”
as it is also known) and setting the Undercloud as a server to receive
DHCP requests.&lt;/p&gt;
&lt;p&gt;The updates to TripleO docs will also have to include a detailed discussion
of choices to be made about IP addressing before a deployment. If supernets
are to be used for network isolation, then a good plan for IP addressing will
be required to ensure scalability in the future.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id20" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;0&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/437544"&gt;Review: TripleO Heat Templates: Tripleo routed networks ironic inspector, and Undercloud&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id21" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id4"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/neutron-specs/specs/newton/routed-networks.html"&gt;Spec: Routed Networks for Neutron&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id22" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id15"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/152732/"&gt;Review: Modify os-net-config to make changes without bouncing interface&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id23" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id8"&gt;5&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-ironic-inspector"&gt;Blueprint: Modify TripleO Ironic Inspector to PXE Boot Via DHCP Relay&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id24" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id9"&gt;6&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/421011"&gt;Spec: Modify TripleO Ironic Inspector to PXE Boot Via DHCP Relay&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id25" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id11"&gt;7&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-deployment"&gt;Blueprint: User-specifiable Control Plane IP on TripleO Routed Isolated Networks&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id26" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id12"&gt;8&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/421010"&gt;Spec: User-specifiable Control Plane IP on TripleO Routed Isolated Networks&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id27" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id10"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/206022/"&gt;Review: Configure ctlplane network with a static IP&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id28" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;10&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id2"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id19"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/438171"&gt;Review: Neutron: Make “on-link” routes for subnets optional&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id29" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;11&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/438175"&gt;Review: Ironic Inspector: Make “on-link” routes for subnets optional&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id30" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;12&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/466448"&gt;Review: Ironic Inspector: Introducing a dnsmasq PXE filter driver&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id31" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id6"&gt;13&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/436716"&gt;Review: Multiple DHCP Subnets for Ironic Inspector&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id32" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;14&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/533367"&gt;Review: Instack Undercloud: Add support for multiple inspection subnets&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id33" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;15&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id13"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id16"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/468744"&gt;Review: DHCP Agent: Separate local from non-local subnets&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id34" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;16&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id14"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id18"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/q/topic:bp/composable-networks+(status:open+OR+status:merged)"&gt;Review Series: topic:bp/composable-networks&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id35" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id17"&gt;17&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/q/project:openstack/networking-baremetal+committer:hjensas%2540redhat.com"&gt;Review Series: project:openstack/networking-baremetal&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Mon, 19 Oct 2020 00:00:00 </pubDate></item><item><title>TripleO Routed Networks Deployment (Spine-and-Leaf Clos)</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/tripleo-routed-networks-templates.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-templates"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-templates&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This blueprint is part of a the series tripleo-routed-networks-deployment &lt;a class="footnote-reference brackets" href="#id16" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;0&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;TripleO uses shared L2 networks for all networks except the provisioning
network today. (Support for L3 provisioning network where added in Queens.)&lt;/p&gt;
&lt;p&gt;L3 support on the provisioning network is using network segments, a concept
in Neutron routed networks, we can represent more than one subnet per VLAN.
Without network segments, we would be limited to one subnet per VLAN.&lt;/p&gt;
&lt;p&gt;For the non-provisioning networks we have no way to model a true L3 routed
network in TripleO today. When deploying such an architecture we currently
create custom (neutron) networks for all the different l2 segments for each
isolated network. While this approach works it comes with some caveats.&lt;/p&gt;
&lt;p&gt;This spec covers refactoring the TripleO Heat Templates to support deployment
onto networks which are segregated into multiple layer 2 domains with routers
forwarding traffic between layer 2 domains.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The master blueprint for routed networks for deployments breaks the problem
set into multiple parts &lt;a class="footnote-reference brackets" href="#id16" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;0&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. This blueprint presents the problems which are
applicable to this blueprint below.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="problem-descriptions"&gt;
&lt;h2&gt;Problem Descriptions&lt;/h2&gt;
&lt;p&gt;Problem #1: Deploy systems onto a routed provisioning network.&lt;/p&gt;
&lt;p&gt;While we can model a routed provisioning network and deploy systems on top of
that network today. Doing so requires additional complex configuration, such
as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Setting up the required static routes to ensure traffic within the L3
control plane takes the desired path troughout the network.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;L2 segments use different router addresses.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;L2 segments may use different subnet masks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other L2 segment property differences.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;This configuration is essentially manually passing in information in the
templates to deploy the overcloud. Information that was already provided when
deploying the undercloud. While this works, it increases complexity and the
possibility that the user provides incorrect configuration data.&lt;/p&gt;
&lt;p&gt;We should be able to get as much of this information based on what was provided
when deploying the undercloud.&lt;/p&gt;
&lt;p&gt;In order to support this model, there are some requirements that have to be
met in Heat and Neutron.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Alternative approaches to Problem #1:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Approach 1:&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This is what we currently do.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Since we control addresses and routes on the host nodes using a
combination of Heat templates and os-net-config, it may be possible to use
static routes to supernets to provide L2 adjacency, rather than relying on
Neutron to generate dynamic lists of routes that would need to be updated
on all hosts.&lt;/p&gt;
&lt;p&gt;The end result of this is that each host has a set of IP addresses and routes
that isolate traffic by function. In order for the return traffic to also be
isolated by function, similar routes must exist on both hosts, pointing to the
local gateway on the local subnet for the larger supernet that contains all
Internal API subnets.&lt;/p&gt;
&lt;p&gt;The downside of this is that we must require proper supernetting, and this may
lead to larger blocks of IP addresses being used to provide ample space for
scaling growth. For instance, in the example above an entire /16 network is set
aside for up to 255 local subnets for the Internal API network. This could be
changed into a more reasonable space, such as /18, if the number of local
subnets will not exceed 64, etc. This will be less of an issue with native IPv6
than with IPv4, where scarcity is much more likely.&lt;/p&gt;
&lt;p&gt;Approch 2:&lt;/p&gt;
&lt;p&gt;Instead of passing parameters such as ControlPlaneCidr,
ControlPlaneDefaultRoute etc implement Neutron RFE &lt;a class="footnote-reference brackets" href="#id20" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and Heat RFE &lt;a class="footnote-reference brackets" href="#id21" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. In
tripleo-heat-templates we can then use get_attr to get the data. And we leave
it to neutron to calculate and provide the routes for the L3 network.&lt;/p&gt;
&lt;p&gt;This would require &lt;a class="footnote-reference brackets" href="#id18" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, which I believe was in quite good shape before it was
abandoned due to activity policy. (An alternative would be to change
os-net-config to have an option to only change and apply routing configuration.
Something like running &lt;a class="reference external" href="https://github.com/fedora-sysv/initscripts/blob/master/sysconfig/network-scripts/ifdown-routes"&gt;ifdown-routes&lt;/a&gt;
/
&lt;a class="reference external" href="https://github.com/fedora-sysv/initscripts/blob/master/sysconfig/network-scripts/ifup-routes"&gt;ifup-routes&lt;/a&gt;
, however &lt;a class="footnote-reference brackets" href="#id18" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; is likely the better solution.)&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;&lt;strong&gt;Problem #2: Static IP assignment: Choosing static IPs from the correct
subnet&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some roles, such as Compute, can likely be placed in any subnet, but we will
need to keep certain roles co-located within the same set of L2 domains. For
instance, whatever role is providing Neutron services will need all controllers
in the same L2 domain for VRRP to work properly.&lt;/p&gt;
&lt;p&gt;The network interfaces will be configured using templates that create
configuration files for os-net-config. The IP addresses that are written to
each node’s configuration will need to be on the correct subnet for each host.
In order for Heat to assign ports from the correct subnets, we will need to
have a host-to-subnets mapping.&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas or Approaches:&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;We currently use #2, by specifying parameters for each role.&lt;/p&gt;
&lt;/div&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The simplest implementation of this would probably be a mapping of
role/index to a set of subnets, so that it is known to Heat that
Controller-1 is in subnet set X and Compute-3 is in subnet set Y. The node
would then have the ip and subnet info for each network chosen from the
appropriate set of subnets. For other nodes, we would need to
programatically determine which subnets are correct for a given node.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We could associate particular subnets with roles, and then use one role
per L2 domain (such as per-rack). This might be achieved with a map of
roles to subnets, or by specifying parameters for each role such as:
supernet, subnet (ID and/or ip/netmask), and subnet router.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Initial implementation might follow the model for isolated networking
demonstrated by the environments/ips-from-pool-all.yaml. Developing the
ips-from-pool model first will allow testing various components with
spine-and-leaf while the templates that use dynamic assignment of IPs
within specified subnets are developed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The roles and templates should be refactored to allow for dynamic IP
assignment within subnets associated with the role. We may wish to evaluate
the possibility of storing the routed subnets in Neutron using the routed
networks extensions that are still under development. However, in this
case, This is probably not required to implement separate subnets in each
rack.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A scalable long-term solution is to map which subnet the host is on
during introspection. If we can identify the correct subnet for each
interface, then we can correlate that with IP addresses from the correct
allocation pool.  This would have the advantage of not requiring a static
mapping of role to node to subnet. In order to do this, additional
integration would be required between Ironic and Neutron (to make Ironic
aware of multiple subnets per network, and to add the ability to make
that association during introspection.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We will also need to take into account sitations where there are heterogeneous
hardware nodes in the same layer 2 broadcast domain (such as within a rack).&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This can be done either using node groups in NetConfigDataLookup as
implemented in review &lt;a class="footnote-reference brackets" href="#id19" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; or by using additional custom roles.&lt;/p&gt;
&lt;/div&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;&lt;strong&gt;Problem #3: Isolated Networking Requires Static Routes to Ensure Correct VLAN
is Used&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In order to continue using the Isolated Networks model, routes will need to be
in place on each node, to steer traffic to the correct VLAN interfaces. The
routes are written when os-net-config first runs, but may change. We
can’t just rely on the specific routes to other subnets, since the number of
subnets will increase or decrease as racks are added or taken away.&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas or Approaches:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Require that supernets are used for various network groups. For instance,
all the Internal API subnets would be part of a supernet, for instance
172.17.0.0/16 could be used, and broken up into many smaller subnets, such
as /24. This would simplify the routes, since only a single route for
172.17.0.0/16 would be required pointing to the local router on the
172.17.x.0/24 network.&lt;/p&gt;
&lt;p&gt;Example:
Suppose 2 subnets are provided for the Internal API network: 172.19.1.0/24
and 172.19.2.0/24. We want all Internal API traffic to traverse the Internal
API VLANs on both the controller and a remote compute node. The Internal API
network uses different VLANs for the two nodes, so we need the routes on the
hosts to point toward the Internal API gateway instead of the default
gateway. This can be provided by a supernet route to 172.19.x.x pointing to
the local gateway on each subnet (e.g. 172.19.1.1 and 172.19.2.1 on the
respective subnets). This could be represented in an os-net-config with the
following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="o"&gt;-&lt;/span&gt;
  &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;nic3&lt;/span&gt;
  &lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt;
      &lt;span class="n"&gt;ip_netmask&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InternalApiXIpSubnet&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt;
      &lt;span class="n"&gt;ip_netmask&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InternalApiSupernet&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;next_hop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InternalApiXDefaultRoute&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Where InternalApiIpSubnet is the IP address on the local subnet,
InternalApiSupernet is ‘172.19.0.0/16’, and InternalApiRouter is either
172.19.1.1 or 172.19.2.1 depending on which local subnet the host belongs to.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify os-net-config so that routes can be updated without bouncing
interfaces, and then run os-net-config on all nodes when scaling occurs.
A review for this functionality is in progress &lt;a class="footnote-reference brackets" href="#id18" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Instead of passing parameters to THT about routes (or supernet routes),
implement Neutron RFE &lt;a class="footnote-reference brackets" href="#id20" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and Heat RFE &lt;a class="footnote-reference brackets" href="#id21" id="id10" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. In tripleo-heat-templates we
can then use get_attr to get the data we currently read from user provided
parameters such as the InternalApiSupernet and InternalApiXDefaultRoute in
the example above. (We might also consider replacing &lt;a class="footnote-reference brackets" href="#id21" id="id11" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; with a change
extending the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network/ports/port.j2&lt;/span&gt;&lt;/code&gt; in tripleo-heat-templates to output
this data.)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;os-net-config configures static routes for each interface. If we can keep the
routing simple (one route per functional network), then we would be able to
isolate traffic onto functional VLANs like we do today.&lt;/p&gt;
&lt;p&gt;It would be a change to the existing workflow to have os-net-config run on
updates as well as deployment, but if this were a non-impacting event (the
interfaces didn’t have to be bounced), that would probably be OK. (An
alternative is to add an option to have an option in os-net-config that only
adds new routes. Something like, os-net-config –no-activate +
ifdown-routes/ifup-routes.)&lt;/p&gt;
&lt;p&gt;At a later time, the possibility of using dynamic routing should be considered,
since it reduces the possibility of user error and is better suited to
centralized management. The overcloud nodes might participate in internal
routing protocols. SDN solutions are another way to provide this, or other
approaches may be considered, such as setting up OVS tunnels.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;&lt;strong&gt;Problem #4: Isolated Networking in TripleO Heat Templates Needs to be
Refactored&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The current isolated networking templates use parameters in nested stacks to
define the IP information for each network. There is no room in the current
schema to define multiple subnets per network, and no way to configure the
routers for each network. These values are provided by single parameters.&lt;/p&gt;
&lt;p&gt;Possible Solutions, Ideas or Approaches:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;We would need to refactor these resources to provide different routers
for each network.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We extend the custom and isolated networks in TripleO to add support for
Neutron routed-networks (segments) and multiple subnets. Each subnet will be
mapped to a different L2 segment. We should make the extension backward
compatible and only enable Neutron routed-networks (I.e associate subnets
with segments.) when the templates used define multiple subnets on a
network. To enable this we need some changes to land in Neutron and Heat,
these are the in-progress reviews:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Allow setting network-segment on subnet update &lt;a class="footnote-reference brackets" href="#id22" id="id12" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Allow updating the segment property of OS::Neutron::Subnet &lt;a class="footnote-reference brackets" href="#id23" id="id13" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add first_segment convenience attr to OS::Neutron::Net &lt;a class="footnote-reference brackets" href="#id24" id="id14" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;The proposed changes are discussed below.&lt;/p&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;In order to provide spine-and-leaf networking for deployments, several changes
will have to be made to TripleO:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Support for DHCP relay in Neutron DHCP servers (in progress), and Ironic
DHCP servers (this is addressed in separate blueprints in the same series).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Refactor assignment of Control Plane IPs to support routed networks (that
is addressed by a separate blueprint: tripleo-predictable-ctlplane-ips &lt;a class="footnote-reference brackets" href="#id17" id="id15" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Refactoring of TripleO Heat Templates network isolation to support multiple
subnets per isolated network, as well as per-subnet and supernet routes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Changes to Infra CI to support testing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation updates.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The approach outlined here is very prescriptive, in that the networks must be
known ahead of time, and the IP addresses must be selected from the appropriate
pool. This is due to the reliance on static IP addresses provided by Heat.
Heat will have to model the subnets and associate them with roles (node
groups).&lt;/p&gt;
&lt;p&gt;One alternative approach is to use DHCP servers to assign IP addresses on all
hosts on all interfaces. This would simplify configuration within the Heat
templates and environment files. Unfortunately, this was the original approach
of TripleO, and it was deemed insufficient by end-users, who wanted stability
of IP addresses, and didn’t want to have an external dependency on DHCP.&lt;/p&gt;
&lt;p&gt;Another approach is to use the DHCP server functionality in the network switch
infrastructure in order to PXE boot systems, then assign static IP addresses
after the PXE boot is done via DHCP. This approach only solves for part of the
requirement: the net booting. It does not solve the desire to have static IP
addresses on each network. This could be achieved by having static IP addresses
in some sort of per-node map. However, this approach is not as scalable as
programatically determining the IPs, since it only applies to a fixed number of
hosts. We want to retain the ability of using Neutron as an IP address
management (IPAM) back-end, ideally.&lt;/p&gt;
&lt;p&gt;Another approach which was considered was simply trunking all networks back
to the Undercloud, so that dnsmasq could respond to DHCP requests directly,
rather than requiring a DHCP relay. Unfortunately, this has already been
identified as being unacceptable by some large operators, who have network
architectures that make heavy use of L2 segregation via routers. This also
won’t work well in situations where there is geographical separation between
the VLANs, such as in split-site deployments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;One of the major differences between spine-and-leaf and standard isolated
networking is that the various subnets are connected by routers, rather than
being completely isolated. This means that without proper ACLs on the routers,
networks which should be private may be opened up to outside traffic.&lt;/p&gt;
&lt;p&gt;This should be addressed in the documentation, and it should be stressed that
ACLs should be in place to prevent unwanted network traffic. For instance, the
Internal API network is sensitive in that the database and message queue
services run on that network. It is supposed to be isolated from outside
connections. This can be achieved fairly easily if supernets are used, so that
if all Internal API subnets are a part of the 172.19.0.0/16 supernet, a simple
ACL rule will allow only traffic between Internal API IPs (this is a simplified
example that would be generally applicable to all Internal API router VLAN
interfaces or for a global ACL):&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="n"&gt;traffic&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;
&lt;span class="n"&gt;deny&lt;/span&gt; &lt;span class="n"&gt;traffic&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="mf"&gt;172.19.0.0&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The isolated networks design separates control plane traffic from data plane
traffic, and separates administrative traffic from tenant traffic. In order
to preserve this separatation of traffic, we will use static routes pointing
to supernets. This ensures all traffic to any subnet within a network will exit
via the interface attached to the local subnet in that network. It will be
important for the end user to implement ACLs in a routed network to prevent
remote access to networks that would be completely isolated in a shared L2
deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Deploying with spine-and-leaf will require additional parameters to
provide the routing information and multiple subnets required. This will have
to be documented. Furthermore, the validation scripts may need to be updated
to ensure that the configuration is validated, and that there is proper
connectivity between overcloud hosts.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Much of the traffic that is today made over layer 2 will be traversing layer
3 routing borders in this design. That adds some minimal latency and overhead,
although in practice the difference may not be noticeable. One important
consideration is that the routers must not be too overcommitted on their
uplinks, and the routers must be monitored to ensure that they are not acting
as a bottleneck, especially if complex access control lists are used.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;A spine-and-leaf deployment will be more difficult to troubleshoot than a
deployment that simply uses a set of VLANs. The deployer may need to have
more network expertise, or a dedicated network engineer may be needed to
troubleshoot in some cases.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Spine-and-leaf is not easily tested in virt environments. This should be
possible, but due to the complexity of setting up libvirt bridges and
routes, we may want to provide a pre-configured quickstart environment
for testing. This may involve building multiple libvirt bridges
and routing between them on the Undercloud, or it may involve using a
DHCP relay on the virt-host as well as routing on the virt-host to simulate
a full routing switch. A plan for development and testing will need to be
developed, since not every developer can be expected to have a routed
environment to work in. It may take some time to develop a routed virtual
environment, so initial work will be done on bare metal.&lt;/p&gt;
&lt;p&gt;A separate blueprint will cover adding routed network support to
tripleo-quickstart.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Dan Sneddon &amp;lt;&lt;a class="reference external" href="mailto:dsneddon%40redhat.com"&gt;dsneddon&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Other assignees:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Bob Fournier &amp;lt;&lt;a class="reference external" href="mailto:bfournie%40redhat.com"&gt;bfournie&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Harald Jensas &amp;lt;&lt;a class="reference external" href="mailto:hjensas%40redhat.com"&gt;hjensas&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Steven Hardy &amp;lt;&lt;a class="reference external" href="mailto:shardy%40redhat.com"&gt;shardy&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dan Prince &amp;lt;&lt;a class="reference external" href="mailto:dprince%40redhat.com"&gt;dprince&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="approver-s"&gt;
&lt;h3&gt;Approver(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary approver:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Alex Schultz &amp;lt;&lt;a class="reference external" href="mailto:aschultz%40redhat.com"&gt;aschultz&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Implement support for DHCP on routed networks using DHCP relay, as
described in Problem #1 above.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add parameters to Isolated Networking model in Heat to support supernet
routes for individual subnets, as described in Problem #3.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify Isolated Networking model in Heat to support multiple subnets, as
described in Problem #4.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement support for iptables on the Controller, in order to mitigate
the APIs potentially being reachable via remote routes, as described in
the Security Impact section. Alternatively, document the mitigation
procedure using ACLs on the routers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document the testing procedures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the documentation in tripleo-docs to cover the spine-and-leaf case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the Ironic-Inspector service to record the host-to-subnet mappings,
perhaps during introspection, to address Problem #2 (long-term).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="implementation-details"&gt;
&lt;h3&gt;Implementation Details&lt;/h3&gt;
&lt;p&gt;Workflow:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Operator configures DHCP networks and IP address ranges&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator imports baremetal instackenv.json&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When introspection or deployment is run, the DHCP server receives the DHCP
request from the baremetal host via DHCP relay&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the node has not been introspected, reply with an IP address from the
introspection pool* and the inspector PXE boot image&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the node already has been introspected, then the server assumes this is
a deployment attempt, and replies with the Neutron port IP address and the
overcloud-full deployment image&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Heat templates are processed which generate os-net-config templates, and
os-net-config is run to assign static IPs from the correct subnets, as well
as routes to other subnets via the router gateway addresses.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When using spine-and-leaf, the DHCP server will need to provide an
introspection IP address on the appropriate subnet, depending on the
information contained in the DHCP relay packet that is forwarded by the segment
router. dnsmasq will automatically match the gateway address (GIADDR) of the
router that forwarded the request to the subnet where the DHCP request was
received, and will respond with an IP and gateway appropriate for that subnet.&lt;/p&gt;
&lt;p&gt;The above workflow for the DHCP server should allow for provisioning IPs on
multiple subnets.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;There may be a dependency on the Neutron Routed Networks. This won’t be clear
until a full evaluation is done on whether we can represent spine-and-leaf
using only multiple subnets per network.&lt;/p&gt;
&lt;p&gt;There will be a dependency on routing switches that perform DHCP relay service
for production spine-and-leaf deployments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;In order to properly test this framework, we will need to establish at least
one CI test that deploys spine-and-leaf. As discussed in this spec, it isn’t
necessary to have a full routed bare metal environment in order to test this
functionality, although there is some work to get it working in virtual
environments such as OVB.&lt;/p&gt;
&lt;p&gt;For bare metal testing, it is sufficient to trunk all VLANs back to the
Undercloud, then run DHCP proxy on the Undercloud to receive all the
requests and forward them to br-ctlplane, where dnsmasq listens. This
will provide a substitute for routers running DHCP relay. For Neutron
DHCP, some modifications to the iptables rule may be required to ensure
that all DHCP requests from the overcloud nodes are received by the
DHCP proxy and/or the Neutron dnsmasq process running in the dhcp-agent
namespace.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The procedure for setting up a dev environment will need to be documented,
and a work item mentions this requirement.&lt;/p&gt;
&lt;p&gt;The TripleO docs will need to be updated to include detailed instructions
for deploying in a spine-and-leaf environment, including the environment
setup. Covering specific vendor implementations of switch configurations
is outside this scope, but a specific overview of required configuration
options should be included, such as enabling DHCP relay (or “helper-address”
as it is also known) and setting the Undercloud as a server to receive
DHCP requests.&lt;/p&gt;
&lt;p&gt;The updates to TripleO docs will also have to include a detailed discussion
of choices to be made about IP addressing before a deployment. If supernets
are to be used for network isolation, then a good plan for IP addressing will
be required to ensure scalability in the future.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id16" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;0&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-deployment"&gt;Blueprint: TripleO Routed Networks for Deployments&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id17" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id15"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/421010/"&gt;Spec: User-specifiable Control Plane IP on TripleO Routed Isolated Networks&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id18" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id5"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id6"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id8"&gt;3&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/152732/"&gt;Review: Modify os-net-config to make changes without bouncing interface&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id19" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/406641/"&gt;Review: Add support for node groups in NetConfigDataLookup&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id20" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id3"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id9"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://bugs.launchpad.net/neutron/+bug/1766380"&gt;[RFE] Create host-routes for routed networks (segments)&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id21" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id4"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id10"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id11"&gt;3&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://storyboard.openstack.org/#!/story/1766946"&gt;[RFE] Extend attributes of Server and Port resource to client interface configuration data&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id22" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id12"&gt;7&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/523972"&gt;Allow setting network-segment on subnet update&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id23" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id13"&gt;8&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/567206"&gt;Allow updating the segment property of OS::Neutron::Subnet&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id24" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id14"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/567207"&gt;Add first_segment convenience attr to OS::Neutron::Net&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Mon, 19 Oct 2020 00:00:00 </pubDate></item><item><title>Install and Configure FRRouter</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/triplo-bgp-frrouter.html</link><description>
 
&lt;p&gt;The goal of this spec is to design and plan requirements for adding support to
TripleO to install and provide a basic configuration of Free Range Router (FRR)
on overcloud nodes in order to support BGP dynamic routing. There are multiple
reasons why an administrator might want to run FRR, including to obtain
multiple routes on multiple uplinks to northbound switches, or to advertise
routes to networks or IP addresses via dynamic routing protocols.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem description&lt;/h2&gt;
&lt;p&gt;There are several use cases for using BGP, and in fact there are separate
efforts underway to utilize BGP for the control plane and data plane.&lt;/p&gt;
&lt;p&gt;BGP may be used for equal-cost multipath (ECMP) load balancing of outbound
links, and bi-directional forwarding detection (BFD) for resiliency to ensure
that a path provides connectivity. For outbound connectivity BGP will learn
routes from BGP peers.&lt;/p&gt;
&lt;p&gt;BGP may be used for advertising routes to API endpoints. In this model HAProxy
will listen on an IP address and FRR will advertise routes to that IP to BGP
peers. High availability for HAProxy is provided via other means such as
Pacemaker, and FRR will simply advertise the virtual IP address when it is
active on an API controller.&lt;/p&gt;
&lt;p&gt;BGP may also be used for routing inbound traffic to provider network IPs or
floating IPs for instance connectivity. The Compute nodes will run FRR to
advertise routes to the local VM IPs or floating IPs hosted on the node. FRR
has a daemon named Zebra that is responsible for exchanging routes between
routing daemons such as BGP and the kernel. The &lt;em&gt;redistribute connected&lt;/em&gt;
statement in the FRR configuration will cause local IP addresses on the host
to be advertised via BGP. Floating IP addresses are attached to a loopback
interface in a namespace, so they will be redistributed using this method.
Changes to OVN will be required to ensure provider network IPs assigned to VMs
will be assigned to a loopback interface in a namespace in a similar fashion.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Create a container with FRR. The container will run the BGP daemon, BFD
daemon, and Zebra daemon (which copies routes to/from the kernel). Provide a
basic configuration that would allow BGP peering with multiple peers. In the
control plane use case the FRR container needs to be started along with the HA
components, but in the data plane use case the container will be a sidecar
container supporting Neutron. The container is defined in a change proposed
here: &lt;a class="footnote-reference brackets" href="#id5" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The container will be deployed using a TripleO Deployment Service. The service
will use Ansible to template the FRR configuration file, and a simple
implementation exists in a proposed change here: &lt;a class="footnote-reference brackets" href="#id6" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The current FRR Ansible module is insufficient to configure BGP parameters and
would need to be extended. At this time the Ansible Networking development
team is not interested in extending the FRR module, so the configuration will
be provided using TripleO templates for the FRR main configuration file and
daemon configuration file. Those templates are defined in a change proposed
here: &lt;a class="footnote-reference brackets" href="#id7" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A user-modifiable environment file will need to be provided so the installer
can provide the configuration data needed for FRR (see User Experience below).&lt;/p&gt;
&lt;p&gt;OVN will need to be modified to enable the Compute node to assign VM provider
network IPs to a loopback interface inside a namespace. These IP address will
not be used for sending or receiving traffic, only for redistributing routes
to the IPs to BGP peers. Traffic which is sent to those IP addresses will be
forwarded to the VM using OVS flows on the hypervisor.  An example agent for
OVN has been written to demonstrate how to monitor the southbound OVN DB and
create loopback IP addresses when a VM is started on a Compute node. The OVN
changes will be detailed in a separate OVN spec. Demonstration code is
available on Github: &lt;a class="footnote-reference brackets" href="#id8" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;section id="user-experience"&gt;
&lt;h4&gt;User Experience&lt;/h4&gt;
&lt;p&gt;The installer will need to provide some basic information for the FRR
configuration, including whether to enable BFD, BGP IPv4, BGP IPv6,
and other settings. See the Example Configuration Data section below.&lt;/p&gt;
&lt;p&gt;Additional user-provided data may include inbound or outbound filter prefixes.
The default filter prefixes will accept only default routes via BGP, and will
export only loopback IPs, which have a /32 subnet mask for IPv4 or /128 subnet
mask for IPv6.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="example-configuration-data"&gt;
&lt;h4&gt;Example Configuration Data&lt;/h4&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;tripleo_frr_bfd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_bgp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_bgp_ipv4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_bgp_ipv4_allowas_in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_bgp_ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_bgp_ipv6_allowas_in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_config_basedir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"/var/lib/config-data/ansible-generated/frr"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ansible_hostname&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_log_level&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;informational&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_watchfrr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tripleo_frr_zebra&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h2&gt;Alternatives&lt;/h2&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Routing outbound traffic via multiple uplinks&lt;/p&gt;
&lt;p&gt;Fault-tolerance and load-balancing for outbound traffic is typically
provided by bonding Ethernet interfaces. This works for most cases, but
is susceptible to unidirectional interface failure, a situation where
traffic works in only one direction. The LACP protocol for bonding does
provide some protection against unidirectional traffic failures, but is not
as robust as bi-directional forwarding detection (BFD) provided by FRR.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Routing inbound traffic to highly-available API endpoints&lt;/p&gt;
&lt;p&gt;The most common method currently used to provide HA for API endpoints is
to use a virtual IP that fails over from active to standby nodes using a
shared Ethernet MAC address. The drawback to this method is that all
standby API controllers must reside on the same layer 2 segment (VLAN) as
the active controller. This presents a challenge if the operator wishes
to place API controllers in different failure domains for power and/or
networking. A BGP daemon avoids this limitation by advertising a route
to the shared IP address directly to the BGP peering router over a routed
layer 3 link.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Routing to Neutron IP addresses&lt;/p&gt;
&lt;p&gt;Data plane traffic is usually delivered to provider network or floating
IP addresses via the Ethernet MAC address associated with the IP and
determined via ARP requests on a shared VLAN. This requires that every
Compute node which may host a provider network IP or floating IP has
the appropriate VLAN trunked to a provider bridge attached to an interface
or bond. This makes it impossible to migrate VMs or floating IPs across
layer 3 boundaries in edge computing topologies or in a fully layer 3
routed datacenter.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h2&gt;Security Impact&lt;/h2&gt;
&lt;p&gt;There have been no direct security impacts identified with this approach. The
installer should ensure that security policy on the network as whole prevents
IP spoofing which could divert legitimate traffic to an unintended host. This
is a concern whether or not the OpenStack nodes are using BGP themselves, and
may be an issue in environments using traditional routing architecture or
static routes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h2&gt;Upgrade Impact&lt;/h2&gt;
&lt;p&gt;When (if) we remove the capability to manage network resources in the
overcloud heat stack, we will need to evaluate whether we want to continue
to provide BGP configuration as a part of the overcloud configuration.&lt;/p&gt;
&lt;p&gt;If an operator wishes to begin using BGP routing at the same time as
upgrading the version of OpenStack used they will need to provide the
required configuration parameters if they differ from the defaults provided
in the TripleO deployment service.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h2&gt;Performance Impact&lt;/h2&gt;
&lt;p&gt;No performance impacts are expected, either positive or negative by using
this approach. Attempts have been made to minimize memory and CPU usage by
using conservative defaults in the configuration.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;This is a new TripleO deployment service and should be properly documented
to instruct installers in the configuration of FRR for their environment.&lt;/p&gt;
&lt;p&gt;The TripleO docs will need updates in many sections, including:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/install_overcloud.html"&gt;TripleO OpenStack Deployment&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html#"&gt;Provisioning Baremetal Before Overcloud Deploy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/custom_networks.html"&gt;Deploying with Custom Networks&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/network_isolation.html"&gt;Configuring Network Isolation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/routed_spine_leaf_network.html"&gt;Deploying Overcloud with L3 routed networking&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The FRR daemons are documented elsewhere, and we should not need to document
usage of BGP in general, as this is a standard protocol. The configuration of
top-of-rack switches is different depending on the make and model of routing
switch used, and we should not expect to provide configuration examples for
network hardware.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;The implementation will require a new TripleO deployment service, container
definition, and modifications to the existing role definitions. Those changes
are proposed upstream, see the References section for URL links.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="assignee-s"&gt;
&lt;h2&gt;Assignee(s)&lt;/h2&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Dan Sneddon&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Secondary assignees:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Michele Baldessari&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Carlos Gonclaves&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Daniel Alvarez Sanchez&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Luis Tomas Bolivar&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h2&gt;Work Items&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Develop the container definition&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define the TripleO deployment service templates&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define the TripleO Ansible role&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the existing TripleO roles to support the above changes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merge the changes to the container, deployment service, and Ansible role&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure FRR packages are available for supported OS versions&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id5" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-common/+/763087"&gt;Review: DNR/DNM Frr support&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id6" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-ansible/+/763572"&gt;Review: Add tripleo_frr role&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id7" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/c/openstack/tripleo-heat-templates/+/763657"&gt;Review: WIP/DNR/DNM FRR service&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id8" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id4"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://gist.github.com/luis5tb/93cc01ebfea5d44abf07c0303e7d1514"&gt;OVN BGP Agent&lt;/a&gt;.&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Wed, 14 Oct 2020 00:00:00 </pubDate></item><item><title>TripleO Ceph Client</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ceph-client.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph-client"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph-client&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Native Ansible roles for TripleO integration with Ceph clusters.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Starting in the Octopus release, Ceph has its own day1 tool called
cephadm &lt;a class="footnote-reference brackets" href="#id6" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and it’s own day2 tool called orchestrator &lt;a class="footnote-reference brackets" href="#id7" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; which
will replace ceph-ansible &lt;a class="footnote-reference brackets" href="#id8" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. While ceph-ansible had the necessary
features to configure Ceph clients, distributing for example config file
and keyrings as necessary on nodes which aren’t members of the Ceph cluster,
neither cephadm or the orchestrator will manage Ceph clients configuration.&lt;/p&gt;
&lt;p&gt;Goal is to create some new ansible roles in TripleO to perform the
Ceph clients (Nova, Cinder, Glance, Manila) configuration, which is of special
importance in TripleO to support deployment scenarios where the Ceph cluster
is externally managed, not controlled by the undercloud, yet the OpenStack
services configuration remains a responsibility of TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;span id="id4"/&gt;&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Introduce a new role into tripleo-ansible for Ceph client configuration.&lt;/p&gt;
&lt;p&gt;The new role will:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Configure OpenStack services as clients of an external Ceph cluster
(in the case of collocation, the ceph cluster is still logically
external)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide Ceph configuration files and cephx keys for OpenStack
clients of RBD and CephFS (Nova, Cinder, Glance, Manila)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Full multiclient support, e.g. one OpenStack deployment may use
multiple Ceph clusters, e.g. multibackend Glance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure clients quickly, e.g. generate the key in one place
and copy it efficiently&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This is a standalone role which is reusable to configure OpenStack
against an externally managed Ceph cluster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Not break existing support for CephExternalMultiConfig which is used
for configuring OpenStack to work with more than one Ceph cluster
when deploying Ceph in DCN environments (Deployment of dashboard on
DCN sites is not in scope with this proposal).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Support for clients configuration might be added in future versions
of cephadm, yet there are some reasons why we won’t be able to use this
feature as-is even if it was available today:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;it assumes the for the cephadm tool to be configured with admin privileges
for the external Ceph cluster, which we don’t have when Ceph is not
managed by TripleO;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;it also assumes that each and every client node has been provisioned into
the external Ceph orchestrator inventory so that evey Ceph MON is able to
log into the client node (overcloud nodes) via SSH;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;while offering the necessary functionalities to copy the config
files and cephx keyrings over to remote client nodes, it won’t be able to
configure for example Nova with the libvirtd secret for qemu-kvm, which is
a task only relevant when the client is OpenStack;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None derived directly from the decision to create new ansible roles. The
distribution of the cephx keyrings itself though should be implemented using
a TripleO service, like the existing CephClient service, so that keyrings
are only deployed on those nodes which actually need those.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;The goal is to preserve and reuse any existing Heat parameter which is
currently consumed to drive ceph-ansible; from operators’ perspective the
problem of configuring a Ceph client isn’t changed and there shouldn’t be
a need to change the existing parameters, it’s just the implementation
which will change.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;As described in the &lt;a class="reference internal" href="#proposed-change"&gt;&lt;span class="std std-ref"&gt;Proposed Change&lt;/span&gt;&lt;/a&gt; section, the purpose of this
role is to proper configure clients and it allows OpenStack services to
connect to an internal or external Ceph cluster, as well as multiple Ceph
cluster in a DCN context.
Since both config files and keys are necessary for many OpenStack services
(Nova, Cinder, Glance, Manila) to make them able to properly interact with
the Ceph cluster, at least two actions should be performed:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;generate keys in one place&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;copy the generated keys efficiently&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;cite&gt;ceph_client&lt;/cite&gt; role should be very small, and a first improvement
in terms of performances can be found on key generation since they are
created in one, centralized place.
The generated keys, then, just need to be distributed across the nodes
of the Ceph cluster, as well as the Ceph cluster config file.
Adding this role to tripleo-ansible avoid adding an extra calls from
a pure deployment perspective; in fact, no additional ansible playbooks
will be triggered and we expect to see performances improved since no
additional layers are involved here.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;How Ceph is deployed could change for anyone maintaining TripleO code
for OpenStack services which use Ceph. In theory there should be no
change as the CephClient service will still configure the Ceph
configuration and Ceph key files in the same locations. Those
developers will just need to switch to the new templates when they are
stable.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;The new role should be enabled by a TripleO service, like it happens
today with the CephClient service.
Depending on the environment file chosen at deployment time, the
actual implementation of such a service could either be based on
ceph-ansible or on the new role.&lt;/p&gt;
&lt;p&gt;When the Ceph cluster is not external, the role will also create
pools and the cephx keyrings into the Ceph cluster; these steps
will be skipped instead when Ceph is external precisely because we won’t
have admin privileges to change the cluster configuration in that case.&lt;/p&gt;
&lt;section id="tripleo-heat-templates"&gt;
&lt;h3&gt;TripleO Heat Templates&lt;/h3&gt;
&lt;p&gt;The existing implementation which depends on ceph-ansible will remain
in-tree for at least 1 deprecation cycle. By reusing the existing Heat
input parameters we should be able to transparently make the clients
configuration happen with ceph-ansible or the new role just by
switching the environment file used at deployment time.
TripleO users who currently use
&lt;cite&gt;environments/ceph-ansible/ceph-ansible-external.yaml&lt;/cite&gt; in order to
have their Overcloud use an existing Ceph cluster, should be able to
apply the same templates to the new template for configuring Ceph
clients, e.g. &lt;cite&gt;environments/ceph-client.yaml&lt;/cite&gt;. This will result in
the new tripleo-ansible/roles/ceph_client role being executed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;fmount&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;fultonj&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gfidente&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;jmolmo&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;/section&gt;
&lt;section id="proposed-schedule"&gt;
&lt;h3&gt;Proposed Schedule&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;OpenStack W: start tripleo-ansible/roles/ceph_client as experimental
and then set it as default in scenarios 001/004. We expect to to
become stable during the W cycle.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;The &lt;cite&gt;ceph_client&lt;/cite&gt; role will be added in tripleo-ansible and allow
configuring the OpenStack services as clients of an external or TripleO
managed Ceph cluster; no new dependencies are added for tripleo-ansible
project. The &lt;cite&gt;ceph_client&lt;/cite&gt; role will work with External Ceph, Internal
Ceph deployed by ceph-ansible, and the Ceph deployment described in
&lt;a class="footnote-reference brackets" href="#id9" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;It should be possible to reconfigure one of the existing CI scenarios
already deploying with Ceph to use the newer &lt;cite&gt;ceph_client&lt;/cite&gt; role,
making it non-voting until the code is stable. Then switch the other
existing CI scenario to it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;No doc changes should be needed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id6" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/ceph/ceph/tree/master/src/cephadm"&gt;cephadm&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id7" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.ceph.com/docs/octopus/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id8" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id9" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/#/c/723108"&gt;tripleo-ceph&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Mon, 12 Oct 2020 00:00:00 </pubDate></item><item><title>Network Data format/schema (v2)</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/triplo-network-data-v2.html</link><description>
 
&lt;p&gt;The network data schema (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network_data.yaml&lt;/span&gt;&lt;/code&gt;) used to define composable
networks in TripleO has had several additions since it was first introduced.
Due to legacy compatibility some additions make the schema somewhat non-
intuitive. Such as adding support for routed networks, where the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;subnets&lt;/span&gt;&lt;/code&gt;
map was introduced.&lt;/p&gt;
&lt;p&gt;The goal of this spec is to get discussion and settle on a new network data
(v2) format that will be used once management of network resources such
as networks, segments and subnets are moved out of the heat stack.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem description&lt;/h2&gt;
&lt;p&gt;The current schema is somewhat inconsistent, and not as precice as it could
be. For example the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;base&lt;/span&gt;&lt;/code&gt; subnet being at level-0, while additional
subnets are in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;subnets&lt;/span&gt;&lt;/code&gt; map. It would be more intuitive to define
all subnets in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;subnets&lt;/span&gt;&lt;/code&gt; map.&lt;/p&gt;
&lt;p&gt;Currently the network resource properties are configured via a mix of
parameters in the heat environment and network data. For example
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;dns_domain&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;admin_state_up&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;enable_dhcp&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ipv6_address_mode&lt;/span&gt;&lt;/code&gt;,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ipv6_ra_mode&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;shared&lt;/span&gt;&lt;/code&gt; properties are configured via Heat parameters,
while other properties such as &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;cidr&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;gateway_ip&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;host_routes&lt;/span&gt;&lt;/code&gt; etc.
is defined in network data.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Change the network data format so that all network properties are managed in
network data, so that network resources can be managed outside of the heat
stack.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Network data v2 format will only be used with the new tooling that
will manage networks outside of the heat stack.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Network data v2 format should stay compatible with tripleo-heat-templates
jinja2 rendering outside of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Network&lt;/span&gt;&lt;/code&gt; resource and it’s
subresources &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Network::{{network.name}}&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;section id="user-experience"&gt;
&lt;h4&gt;User Experience&lt;/h4&gt;
&lt;p&gt;Tooling will be provided for user’s to export the network information from
an existing deployment. This tooling will output a network data file in
v2 format, which from then on can be used to manage the network resources
using tripleoclient commands or tripleo-ansible cli playbooks.&lt;/p&gt;
&lt;p&gt;The command line tool to manage the network resources will output the
environment file that must be included when deploying the heat stack. (Similar
to the environment file produced when provisioning baremetal nodes without
nova.)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="cli-commands"&gt;
&lt;h4&gt;CLI Commands&lt;/h4&gt;
&lt;p&gt;Command to export provisioned overcloud network information to network data v2
format.&lt;/p&gt;
&lt;div class="highlight-shell notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;openstack overcloud network &lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  --stack &amp;lt;stack_name&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  --output &amp;lt;network_data_v2.yaml&amp;gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Command to create/update overcloud networks outside of heat.&lt;/p&gt;
&lt;div class="highlight-shell notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;openstack overcloud network provision &lt;span class="se"&gt;\&lt;/span&gt;
  --networks-file &amp;lt;network_data_v2.yaml&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  --output &amp;lt;network_environment.yaml&amp;gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Main difference between current network data schema and the v2 schema proposed
here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Base subnet is moved to the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;subnets&lt;/span&gt;&lt;/code&gt; map, aligning configuration for
non-routed and routed deploymends (spine-and-leaf, DCN/Edge)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;enabled&lt;/span&gt;&lt;/code&gt; (bool) is no longer used. Disabled networks should be
excluded from the file, removed or commented.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;compat_name&lt;/span&gt;&lt;/code&gt; option is no longer required. This was used to change
the name of the heat resource internally. Since the heat resource will be a
thing of the past with network data v2, we don’t need it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The keys &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ip_subnet&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;gateway_ip&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;allocation_pools&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;routes&lt;/span&gt;&lt;/code&gt;,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ipv6_subnet&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;gateway_ipv6&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ipv6_allocation_pools&lt;/span&gt;&lt;/code&gt; and
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;routes_ipv6&lt;/span&gt;&lt;/code&gt; are no longer valid at the network level.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New key &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;physical_network&lt;/span&gt;&lt;/code&gt;, our current physical_network names for base and
non-base segments are not quite compatible. Adding logic in code to
compensate is complex. (This field may come in handy when creating ironic
ports in metalsmith as well.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New keys &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network_type&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;segmentation_id&lt;/span&gt;&lt;/code&gt; since we could have users
that used &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;{{network.name}}NetValueSpecs&lt;/span&gt;&lt;/code&gt; to set network_type vlan.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;The new tooling should validate that non of the keys previously
valid in network data v1 are used in network data v2.&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="example-network-data-v2-file-for-ipv4"&gt;
&lt;h4&gt;Example network data v2 file for IPv4&lt;/h4&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;name_lower&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage                     (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;name.lower())&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;admin_state_up&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false                   (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;dns_domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage.localdomain.        (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;mtu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1442                               (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1500)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;shared&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false                           (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;service_net_map_replace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage        (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true                              (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;vip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true                               (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;subnets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;subnet01&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.0/24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.254            (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;allocation_pools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;(optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;- start&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.10&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;end&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.250&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;enable_dhcp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false                  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;                             &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;(optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;- destination&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.0/24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;nexthop&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;vlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;21                            (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;physical_network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet01  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt;name.lower&lt;/span&gt;&lt;span class="p p-Indicator"&gt;}}&lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;_{{subnet name}})&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;network_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;flat                  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;flat)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;segmentation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;21                 (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;subnet02&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.0/24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.254            (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;allocation_pools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;(optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;- start&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.10&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;end&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.250&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;enable_dhcp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false                  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;                             &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;(optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[]&lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;- destination&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.0/24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;nexthop&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;vlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;20                            (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;physical_network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet02  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt;name.lower&lt;/span&gt;&lt;span class="p p-Indicator"&gt;}}&lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;_{{subnet name}})&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;network_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;flat                  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;flat)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;segmentation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;20                 (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="example-network-data-v2-file-for-ipv6"&gt;
&lt;h4&gt;Example network data v2 file for IPv6&lt;/h4&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;name_lower&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;admin_state_up&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;dns_domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage.localdomain.&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;mtu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1442&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;shared&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;vip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;subnets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;subnet01&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::/64&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;gateway_ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_allocation_pools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::0010&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::fff9&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;enable_dhcp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;routes_ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::/64&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;nexthop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_address_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;null&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_ra_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;null&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;vlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;21&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;physical_network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet01  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt;name.lower&lt;/span&gt;&lt;span class="p p-Indicator"&gt;}}&lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;_{{subnet name}})&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;network_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;flat                  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;flat)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;segmentation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;21                 (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;subnet02&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::/64&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;gateway_ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_allocation_pools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::0010&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::fff9&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;enable_dhcp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;routes_ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::/64&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;nexthop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_address_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;null&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_ra_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;null&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;vlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;20&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;physical_network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage_subnet02  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt;name.lower&lt;/span&gt;&lt;span class="p p-Indicator"&gt;}}&lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;_{{subnet name}})&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;network_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;flat                  (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;flat)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;segmentation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;20                 (optional, default&lt;/span&gt;&lt;span class="p p-Indicator"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undef)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="example-network-data-v2-file-for-dual-stack"&gt;
&lt;h4&gt;Example network data v2 file for dual stack&lt;/h4&gt;
&lt;p&gt;Dual IPv4/IPv6 with two subnets per-segment, one for IPv4 and the other for
IPv6. A single neutron port with an IP address in each subnet can be created.&lt;/p&gt;
&lt;p&gt;In this case &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ipv6&lt;/span&gt;&lt;/code&gt; key will control weather services are configured to
bind to IPv6 or IPv4. (default ipv6: false)&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;name_lower&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;admin_state_up&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;dns_domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;storage.localdomain.&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;mtu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;1442&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;shared&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;true                            (default ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false)&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;vip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;subnets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;subnet01&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.0/24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;allocation_pools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.10&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.250&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.0/24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;nexthop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::/64&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;gateway_ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_allocation_pools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::0010&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::fff9&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;routes_ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::/64&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;nexthop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;vlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;21&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;subnet02&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.0/24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;gateway_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;allocation_pools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.10&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.250&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.1.0/24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;nexthop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;172.18.0.254&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::/64&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;gateway_ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;ipv6_allocation_pools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::0010&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::fff9&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;routes_ipv6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:a::/64&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;nexthop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2001:db8:b::1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;vlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;20&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Not changing the network data format&lt;/p&gt;
&lt;p&gt;In this case we need an alternative to provide the values for resource
properties currently managed using heat parameters, when moving
management of the network resources outside the heat stack.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Only add new keys for properties&lt;/p&gt;
&lt;p&gt;Keep the concept of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;base&lt;/span&gt;&lt;/code&gt; subnet at level-0, and only add keys
for properties currently managed using heat parameters.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h2&gt;Security Impact&lt;/h2&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h2&gt;Upgrade Impact&lt;/h2&gt;
&lt;p&gt;When (if) we remove the capability to manage network resources in the
overcloud heat stack, the user must run the export command to generate
a new network data v2 file. Use this file as input to the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt;
&lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;network&lt;/span&gt; &lt;span class="pre"&gt;provision&lt;/span&gt;&lt;/code&gt; command, to generate the environment file
required for heat stack without network resources.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h2&gt;Performance Impact&lt;/h2&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The network data v2 format must be documented. Procedures to use the commands
to export network information from existing deployments as well as
procedures to provision/update/adopt network resources with the non-heat stack
tooling must be provided.&lt;/p&gt;
&lt;p&gt;Heat parameters which will be deprecated/removed:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;{{network.name}}NetValueSpecs&lt;/span&gt;&lt;/code&gt;: Deprecated, Removed.
This was used to set &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;provider:physical_network&lt;/span&gt;&lt;/code&gt; and
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;provider:network_type&lt;/span&gt;&lt;/code&gt;, or actually &lt;strong&gt;any&lt;/strong&gt; network property.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;{network.name}}NetShared&lt;/span&gt;&lt;/code&gt;: Deprecated, replaced by network level
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;shared&lt;/span&gt;&lt;/code&gt; (bool)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;{{network.name}}NetAdminStateUp&lt;/span&gt;&lt;/code&gt;: Deprecated, replaced by network
level &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;admin_state_up&lt;/span&gt;&lt;/code&gt; (bool)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;{{network.name}}NetEnableDHCP&lt;/span&gt;&lt;/code&gt;: Deprecated, replaced by subnet
level &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;enable_dhcp&lt;/span&gt;&lt;/code&gt; (bool)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;IPv6AddressMode&lt;/span&gt;&lt;/code&gt;: Deprecated, replaced by subnet level
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ipv6_address_mode&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;IPv6RAMode&lt;/span&gt;&lt;/code&gt;: Deprecated, replaced by subnet level &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ipv6_ra_mode&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once deployed_networks.yaml (&lt;a class="reference external" href="https://review.opendev.org/751876"&gt;https://review.opendev.org/751876&lt;/a&gt;) is used the
following parameters are Deprecated, since they will no longer be used:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;{{network.name}}NetCidr&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{network.name}}SubnetName&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{network.name}}Network&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{network.name}}AllocationPools&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{network.name}}Routes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{network.name}}SubnetCidr_{{subnet}}&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{network.name}}AllocationPools_{{subnet}}&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;{{network.name}}Routes_{{subnet}}&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Harald Jensås&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add tags to resources using heat stack - &lt;a class="reference external" href="https://review.opendev.org/750666"&gt;https://review.opendev.org/750666&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tools to extract provisioned networks from existing deployment
&lt;a class="reference external" href="https://review.opendev.org/750671"&gt;https://review.opendev.org/750671&lt;/a&gt;, &lt;a class="reference external" href="https://review.opendev.org/750672"&gt;https://review.opendev.org/750672&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New tooling to provision/update/adopt networks
&lt;a class="reference external" href="https://review.opendev.org/751739"&gt;https://review.opendev.org/751739&lt;/a&gt;, &lt;a class="reference external" href="https://review.opendev.org/751875"&gt;https://review.opendev.org/751875&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deployed networks template in THT - &lt;a class="reference external" href="https://review.opendev.org/751876"&gt;https://review.opendev.org/751876&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
</description><pubDate>Mon, 05 Oct 2020 00:00:00 </pubDate></item><item><title>Disable Swift from the Undercloud</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/excise-swift.html</link><description>
 
&lt;p&gt;The goal of this proposal is to introduce the community to the idea of
disabling Swift on the TripleO Undercloud. Within this propose we intend
to provide a high-level overview of how we can accomplish this goal.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Swift is being used to store objects related to the deployment which are
managed entirely on the Undercloud. In the past, there was an API / UI to
interact with the deployment tooling; however, with the deprecation of the UI
and the removal of Mistral this is no longer the case. The Undercloud is
assumed to be a single node which is used to deploy OpenStack clouds, and
requires the user to login to the node to run commands. Because we’re no longer
attempting to make the Undercloud a distributed system there’s no need for an
API’able distributed storage service. Swift, in it’s current state, is
under-utilized and carries unnecessary operational and resource overhead.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Decommission Swift from the Undercloud.&lt;/p&gt;
&lt;p&gt;To decommission Swift, we’ll start by removing all of the &lt;cite&gt;tripleoclient&lt;/cite&gt; Swift
interactions. These interactions are largely storing and retrieving YAML files
which provide context to the user for current deployment status. To ensure
we’re not breaking deployment expectations, we’ll push everything to the local
file system and retain all of the file properties wherever possible. We will
need coordinate with tripleo-ansible to ensure we’re making all direct Swift
client and module interactions optional.&lt;/p&gt;
&lt;p&gt;Once we’re able to remove the &lt;cite&gt;tripleoclient&lt;/cite&gt; Swift interactions, we’ll move to
disable Swift interactions from tripleo-common. These interactions are similar
to the ones found within the &lt;cite&gt;tripleoclient&lt;/cite&gt;, though tripleo-common has some
complexity; we’ll need to ensure we’re not breaking expectations we’ve created
with our puppet deployment methodologies which have some Swift assumptions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We keep everything as-is.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;There should be no significant security implications when disabling Swift.
It could be argued that disabling Swift might make the deployment more secure,
it will lessen the attack surface; however, given the fact that Swift on the
Undercloud is only used by director I would consider any benefit insignificant.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;There will be no upgrade impact; this change will be transparent to the
end-user.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Disabling Swift could make some client interactions faster; however, the
benefit should be negligible. That said, disabling Swift would remove a
service on the Undercloud, which would make setup faster and reduce the
resources required to run the Undercloud.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Operationally we should see an improvement as it will no longer be required to
explore a Swift container, and download files to debug different parts of the
deployment. All deployment related file artifacts housed within Swift will
exist on the Undercloud using the local file system, and should be easily
interacted with.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None, if anything disabling Swift should make the life of a TripleO developer
easier.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Excising Swift client interactions will be handled directly in as few reviews
as possible; hopefully allowing us to backport this change, should it be deemed
valuable to stable releases.&lt;/p&gt;
&lt;p&gt;All of the objects stored within Swift will be stored in
&lt;cite&gt;/var/lib/tripleo/{named_artifact_directories}&lt;/cite&gt;. This will allow us to
implement all of the same core logic in our various libraries just without the
use of the API call to store the object.&lt;/p&gt;
&lt;p&gt;In terms of enabling us to eliminate swift without having a significant impact
on the internal API we’ll first start by trying to replace the swift object
functions within tripleo-common with local file system calls. By using the
existing functions and replacing the backend we’ll ensure API compatibility and
lessen the likely hood of creating regressions.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;We’ll need to collaborate with various groups to ensure we’re porting assumed
functionality correctly. While this spec will not go into the specifics
implementation details for porting assumed functionality, it should be known
that we will be accountable for ensuring existing functionality is ported
appropriately.&lt;/p&gt;
&lt;/div&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;cloudnull&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Other contributors:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;emilien&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ekultails&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;The work items listed here are high level, and not meant to provide specific
implementation details or timelines.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Enumerate all of the Swift interactions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a space on the Undercloud to house the files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This location will be on the local file system and will be created into a
git archive; git is used for easier debug, rapid rollback, and will
provide simple versioning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create an option to disable Swift on the Undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Convert client interactions to using the local file system&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure all tripleo-ansible Swift client calls are made optional&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Convert tripleo-common Swift interactions to using the local file system&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Disable Swift on the Undercloud&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Before Swift can be disabled on the Undercloud we will need ensure the
deployment methodology has been changed to Metalsmith.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The Swift tests will need to be updated to use the local file system, however
the existing tests and test structure will be reused.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;There are several references to Swift in our documentation which we will need to
update.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.opendev.org/p/tripleo-heat-swift-removal-undercloud"&gt;https://etherpad.opendev.org/p/tripleo-heat-swift-removal-undercloud&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://paste.openstack.org/show/798208"&gt;http://paste.openstack.org/show/798208&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Tue, 15 Sep 2020 00:00:00 </pubDate></item><item><title>Podman support for container management</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/podman.html</link><description>
 
&lt;p&gt;Launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/podman-support"&gt;https://blueprints.launchpad.net/tripleo/+spec/podman-support&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There is an ongoing desire to manage TripleO containers with a set of tools
designed to solve complex problems when deploying applications.
The containerization of TripleO started with a Docker CLI implementation
but we are looking at how we could leverage the container orchestration
on a Kubernetes friendly solution.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;There are three problems that this document will cover:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;There is an ongoing discussion on whether or not Docker will be
maintained on future versions of Red Hat platforms. There is a general
move on OCI (Open Containers Initiative) conformant runtimes, as CRI-O
(Container Runtime Interface for OCI).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The TripleO community has been looking at how we could orchestrate the
containers lifecycle with Kubernetes, in order to bring consistency with
other projects like OpenShift for example.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The TripleO project aims to work on the next version of Red Hat platforms,
therefore we are looking at Docker alternatives in Stein cycle.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="introduction"&gt;
&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;The containerization of TripleO has been an ongoing effort since a few releases
now and we’ve always been looking at a step-by-step approach that tries to
maintain backward compatibility for the deployers and developers; and also
in a way where upgrade from a previous release is possible, without too much
pain. With that said, we are looking at a proposed change that isn’t too much
disruptive but is still aligned with the general roadmap of the container
story and hopefully will drive us to manage our containers with Kubernetes.
We use Paunch project to provide an abstraction in our container integration.
Paunch will deal with container configurations formats with backends support.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="integrate-podman-cli"&gt;
&lt;h3&gt;Integrate Podman CLI&lt;/h3&gt;
&lt;p&gt;The goal of Podman is to allow users to run standalone (non-orchestrated)
containers which is what we have been doing with Docker until now.
Podman also allows users to run groups of containers called Pods where a Pod is
a term developed for the Kubernetes Project which describes an object that
has one or more containerized processes sharing multiple namespaces
(Network, IPC and optionally PID).
Podman doesn’t have any daemon which makes it lighter than Docker and use a
more traditional fork/exec model of Unix and Linux.
The container runtime used by Podman is runc.
The CLI has a partial backward compatibility with Docker so its integration
in TripleO shouldn’t be that painful.&lt;/p&gt;
&lt;p&gt;It is proposed to add support for Podman CLI (beside Docker CLI) in TripleO
to manage the creation, deletion, inspection of our containers.
We would have a new parameter called ContainerCli in TripleO, that if set to
‘podman’, will make the container provisioning done with Podman CLI and not
Docker CLI.&lt;/p&gt;
&lt;p&gt;Because there is no daemon, there are some problems that we needs to solve:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Automatically restart failed containers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automatically start containers when the host is (re)booted.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Start the containers in a specific order during host boot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide an channel of communication with containers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run container healthchecks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To solve the first 3 problems, it is proposed to use Systemd:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Use Restart so we can configure a restart policy for our containers.
Most of our containers would run with Restart=always policy, but we’ll
have to support some exceptions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The systemd services will be enabled by default so the containers start
at boot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The ordering will be managed by Wants which provides Implicit Dependencies
in Systemd. Wants is a weaker version of Requires. It’ll allow to make sure
we start HAproxy before Keepalived for example, if they are on the same host.
Because it is a weak dependency, they will only be honored if the containers
are running on the same host.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The way containers will be managed (start/stop/restart/status) will be
familiar for our operators used to control Systemd services. However
we probably want to make it clear that this is not our long term goal to
manage the containers with Systemd.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Systemd integration would be:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;complete enough to cover our use-cases and bring feature parity with the
Docker implementation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;light enough to be able to migrate our container lifecycle with Kubernetes
in the future (e.g. CRI-O).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the fourth problem, we are still investigating the options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;varlink: interface description format and protocol that aims to make services
accessible to both humans and machines in the simplest feasible way.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CRI-O: CI-based implementation of Kubernetes Container Runtime Interface
without Kubelet. For example, we could use a CRI-O Python binding to
communicate with the containers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A dedicated image which runs the rootwrap daemon, with rootwrap filters to only run the allowed
commands.  The controlling container will have the rootwrap socket mounted in so that it can
trigger allowed calls in the rootwrap container.  For pacemaker, the rootwrap container will allow
image tagging. For neutron, the rootwrap container will spawn the processes inside the container,
so it will need to be a long-lived container that is managed outside paunch.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;+———+     +———-+
|         |     |          |
| L3Agent +—–+ Rootwrap |
|         |     |          |
+———+     +———-+&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;In this example, the L3Agent container has mounted in the rootwrap daemon socket so that it can
run allowed commands inside the rootwrap container.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, the fifth problem is still an ongoing question.
There are some plans to support healthchecks in Podman but nothing has been
done as of today. We might have to implement something on our side with
Systemd.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h2&gt;Alternatives&lt;/h2&gt;
&lt;p&gt;Two alternatives are proposed.&lt;/p&gt;
&lt;section id="cri-o-integration"&gt;
&lt;h3&gt;CRI-O Integration&lt;/h3&gt;
&lt;p&gt;CRI-O is meant to provide an integration path between OCI conformant runtimes
and the kubelet. Specifically, it implements the Kubelet Container Runtime
Interface (CRI) using OCI conformant runtimes. Note that the CLI utility for
interacting with CRI-O isn’t meant to be used in production, so managing
the containers lifecycle with a CLI is only possible with Docker or Podman.&lt;/p&gt;
&lt;p&gt;So instead of a smooth migration from Docker CLI to Podman CLI, we could go
straight to Kubernetes integration and convert our TripleO services to work
with a standalone Kubelet managed by CRI-O.
We would have to generate YAML files for each container in a Pod format,
so CRI-O can manage them.
It wouldn’t require Systemd integration, as the containers will be managed
by Kubelet.
The operator would control the container lifecycle by using kubectl commands
and the automated deployment &amp;amp; upgrade process would happen in Paunch with
a Kubelet backend.&lt;/p&gt;
&lt;p&gt;While this implementation will help us to move to a multi-node Kubernetes
friendly environment, it remains the most risky option in term of the
quantity of work that needs to happen versus the time that we have to design,
implement, test and ship the next tooling before the end of Stein cycle.&lt;/p&gt;
&lt;p&gt;We also need to keep in mind that CRI-O and Podman share containers/storage
and containers/image libraries, so the issues that we have had with Podman
will be hit with CRI-O as well.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="keep-docker"&gt;
&lt;h3&gt;Keep Docker&lt;/h3&gt;
&lt;p&gt;We could keep Docker around and do not change anything in the way we manage
containers. We could also keep Docker and make it work with CRI-O.
The only risk here is that Docker tooling might not be supported in the future
by Red Hat platforms and we would be on our own if any issue with Docker.
The TripleO community is always seeking for an healthy and long term
collaboration between us and the projects communities that we are interracting
with.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="proposed-roadmap"&gt;
&lt;h2&gt;Proposed roadmap&lt;/h2&gt;
&lt;p&gt;In Stein:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Make Paunch support Podman as an alternative to Docker.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get our existing services fully deployable on Podman, with parity to
what we had with Docker.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If we have time, add Podman pod support to Paunch&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In “T” cycle:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Rewrite all of our container yaml to the pod format.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add a Kubelet backend to Paunch (or change our agent tooling to call
Kubelet directly from Ansible).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get our existing service fully deployable via Kublet, with parity to
what we had with Podman / Docker.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Evaluate switching to Kubernetes proper.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h2&gt;Security Impact&lt;/h2&gt;
&lt;p&gt;The TripleO containers will rely on Podman security.
If we don’t use CRI-O or varlink to communicate with containers, we’ll have
to consider running some containers in privileged mode and mount
/var/lib/containers into the containers. This is a security concern and
we’ll have to evaluate it.
Also, we’ll have to make the proposed solution with SELinux in Enforcing mode.&lt;/p&gt;
&lt;p&gt;Docker solution doesn’t enforce selinux separation between containers.
Podman does, and there’s currently no easy way to deactivate that globally.
So we’ll basically get a more secure containers with Podman, as we have to
support separation from the very beginning.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h2&gt;Upgrade Impact&lt;/h2&gt;
&lt;p&gt;The containers that were managed by Docker Engine will be removed and
provisioned into the new runtime. This process will happen when Paunch
generates and execute the new container configuration.
The operator shouldn’t have to do any manual action and the migration will be
automated, mainly by Paunch.
The Containerized Undercloud upgrade job will test the upgrade of an Undercloud
running Docker containers on Rocky and upgrade to Podman containers on Stein.
The Overcloud upgrade jobs will also test.&lt;/p&gt;
&lt;p&gt;Note: as the docker runtime doesn’t have the selinux separation,
some chcon/relabelling might be needed prior the move to podman runtime.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="end-user-impact"&gt;
&lt;h2&gt;End User Impact&lt;/h2&gt;
&lt;p&gt;The operators won’t be able to run Docker CLI like before and instead will
have to use Podman CLI, where some backward compatibility is garanteed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h2&gt;Performance Impact&lt;/h2&gt;
&lt;p&gt;There are different aspects of performances that we’ll need to investigate:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Container performances (relying on Podman).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How Systemd + Podman work together and how restart work versus Docker engine.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="deployer-impact"&gt;
&lt;h2&gt;Deployer Impact&lt;/h2&gt;
&lt;p&gt;There shouldn’t be much impact for the deployer, as we aim to make this change
the most transparent as possible. The only option (so far) that will be
exposed to the deployer will be “ContainerCli”, where only ‘docker’ and
‘podman’ will be supported. If ‘podman’ is choosen, the transition will be
automated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h2&gt;Developer Impact&lt;/h2&gt;
&lt;p&gt;There shouldn’t be much impact for the developer of TripleO services, except
that there are some things in Podman that slightly changed when comparing
with Docker. For example Podman won’t create the missing directories when
doing bind-mount into the containers, while Docker create them.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="contributors"&gt;
&lt;h3&gt;Contributors&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Bogdan Dobrelya&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cédric Jeanneret&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Emilien Macchi&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Steve Baker&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Update TripleO services to work with Podman (e.g. fix bind-mounts issues).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SELinux separation (relates to bind-mounts rights + some other issues when
we’re calling iptables/other host command from a containe)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Systemd integration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Healthcheck support.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Socket / runtime: varlink? CRI-O?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upgrade workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Testing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation for operators.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The Podman integration depends a lot on how stable is the tool and how
often it is released and shipped so we can test it in CI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Healthchecks interface depends on Podman’s roadmap.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;First of all, we’ll switch the Undercloud jobs to use Podman and this work
should be done by milestone-1. Both the deployment and upgrade jobs should
be switched and actually working.
The overcloud jobs should be switched by milestone-2.&lt;/p&gt;
&lt;p&gt;We’ll keep Docker testing support until we keep testing running on CentOS7
platform.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We’ll need to document the new commands (mainly the same as Docker), and
the differences of how containers should be managed (Systemd instead of Docker
CLI for example).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.projectatomic.io/blog/2018/02/reintroduction-podman/"&gt;https://www.projectatomic.io/blog/2018/02/reintroduction-podman/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/kubernetes-sigs/cri-o"&gt;https://github.com/kubernetes-sigs/cri-o&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/kubernetes/community/blob/master/contributors/devel/container-runtime-interface.md"&gt;https://github.com/kubernetes/community/blob/master/contributors/devel/container-runtime-interface.md&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://varlink.org/"&gt;https://varlink.org/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/containers/libpod/blob/master/transfer.md"&gt;https://github.com/containers/libpod/blob/master/transfer.md&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-standalone-kubelet-poc"&gt;https://etherpad.openstack.org/p/tripleo-standalone-kubelet-poc&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Tue, 09 Jun 2020 00:00:00 </pubDate></item><item><title>Improve logging for ansible calls in tripleoclient</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/xena/ansible-logging-tripleoclient.html</link><description>
 
&lt;p&gt;Launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/ansible-logging-tripleoclient"&gt;https://blueprints.launchpad.net/tripleo/+spec/ansible-logging-tripleoclient&lt;/a&gt;&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem description&lt;/h2&gt;
&lt;p&gt;Currently, the ansible playbooks logging as shown during a deploy or day-2
operations such us upgrade, update, scaling is either too verbose, or not
enough.&lt;/p&gt;
&lt;p&gt;Furthermore, since we’re moving to ephemeral services on the Undercloud (see
&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/ephemeral-heat-overcloud.html"&gt;ephemeral heat&lt;/a&gt; for instance), getting information about the state, content
and related things is a bit less intuitive. A proper logging, with associated
CLI, can really improve that situation and provide a better user experience.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="requirements-for-the-solution"&gt;
&lt;h2&gt;Requirements for the solution&lt;/h2&gt;
&lt;section id="no-new-service-addition"&gt;
&lt;h3&gt;No new service addition&lt;/h3&gt;
&lt;p&gt;We are already trying to remove things from the Undercloud, such as Mistral,
it’s not in order to add new services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="no-increase-in-deployment-and-day-2-operations-time"&gt;
&lt;h3&gt;No increase in deployment and day-2 operations time&lt;/h3&gt;
&lt;p&gt;The solution must not increase the time taken for deploy, update, upgrades,
scaling and any other day-2 operations. It must be 100% transparent to the
operator.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="use-existing-tools"&gt;
&lt;h3&gt;Use existing tools&lt;/h3&gt;
&lt;p&gt;In the same way we don’t want to have new services, we don’t want to reinvent
the wheel once more, and we must check the already huge catalog of existing
solutions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="kiss"&gt;
&lt;h3&gt;KISS&lt;/h3&gt;
&lt;p&gt;Keep It Simple Stupid is a key element - code must be easy to understand and
maintain.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="introduction"&gt;
&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;While working on the &lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/stein/validation-framework.html"&gt;Validation Framework&lt;/a&gt;, a big part was about the logging.
There, we found a way to get an actual computable output, and store it in a
defined location, allowing to provide a nice interface in order to list and
show validation runs.&lt;/p&gt;
&lt;p&gt;This heavily relies on an ansible callback plugin with specific libs, which are
shipped in &lt;a class="reference external" href="https://opendev.org/openstack/validations-libs"&gt;python-validations-libs&lt;/a&gt; package.&lt;/p&gt;
&lt;p&gt;Since the approach is modular, those libs can be re-used pretty easily in other
projects.&lt;/p&gt;
&lt;p&gt;In addition, python-tripleoclient already depends on &lt;a class="reference external" href="https://opendev.org/openstack/validations-libs"&gt;python-validations-libs&lt;/a&gt;
(via a dependency on validations-common), meaning we already have the needed
bits.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="the-idea"&gt;
&lt;h3&gt;The Idea&lt;/h3&gt;
&lt;p&gt;Since we have the mandatory code already present on the system (provided by the
new &lt;a class="reference external" href="https://opendev.org/openstack/validations-libs"&gt;python-validations-libs&lt;/a&gt; package), we can modify how ansible-runner is
configured in order to inject a callback, and get the output we need in both
the shell (direct feedback to the operator) and in a dedicated file.&lt;/p&gt;
&lt;p&gt;Since callback aren’t cheap (but, hopefully not expensive either), proper PoC
must be conducted in order to gather metrics about CPU, RAM and time. Please
see Performance Impact section.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="direct-feedback"&gt;
&lt;h3&gt;Direct feedback&lt;/h3&gt;
&lt;p&gt;The direct feedback will tell the operator about the current task being done
and, when it ends, if it’s a success or not.&lt;/p&gt;
&lt;p&gt;Using a callback might provide a “human suited” output.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="file-logging"&gt;
&lt;h3&gt;File logging&lt;/h3&gt;
&lt;p&gt;Here, we must define multiple things, and take into account we’re running
multiple playbooks, with multiple calls to ansible-runner.&lt;/p&gt;
&lt;section id="file-location"&gt;
&lt;h4&gt;File location&lt;/h4&gt;
&lt;p&gt;Nowadays, most if not all of the deploy related files are located in the
user home directory (i.e. ~/overcloud-deploy/&amp;lt;stack&amp;gt;/).
It therefore sounds reasonable to get the log in the same location, or a
subdirectory in that location.&lt;/p&gt;
&lt;p&gt;Keeping this location also solves the potential access right issue, since a
standard home directory has a 0700 mode, preventing any other user to access
its content.&lt;/p&gt;
&lt;p&gt;We might even go a bit deeper, and enforce a 0600 mode, just to be sure.&lt;/p&gt;
&lt;p&gt;Remember, logs might include sensitve data, especially when we’re running with
extra debugging.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="file-format-convention"&gt;
&lt;h4&gt;File format convention&lt;/h4&gt;
&lt;p&gt;In order to make the logs easily usable by automated tools, and since we
already heavily rely on JSON, the log output should be formated as JSON. This
would allow to add some new CLI commands such as “history list”, “history show”
and so on.&lt;/p&gt;
&lt;p&gt;Also, JSON being well known by logging services such as ElasticSearch, using it
makes sending them to some central logging service really easy and convenient.&lt;/p&gt;
&lt;p&gt;While JSON is nice, it will more than probably prevent a straight read by the
operator - but with a working CLI, we might get something closer to what we
have in the &lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/stein/validation-framework.html"&gt;Validation Framework&lt;/a&gt;, for instance (see &lt;a class="reference external" href="https://asciinema.org/a/283645"&gt;this example&lt;/a&gt;). We
might even consider a CLI that will allow to convert from JSON to whatever
the operator might want, including but not limited to XML, plain text or JUnit
(Jenkins).&lt;/p&gt;
&lt;p&gt;There should be a new parameter allowing to switch the format, from “plain” to
“json” - the default value is still subject to discussion, but providing this
parameter will ensure Operators can do whetever they want with the default
format. A concensus seems to indicate “default to plain”.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="filename-convention"&gt;
&lt;h4&gt;Filename convention&lt;/h4&gt;
&lt;p&gt;As said, we’re running multiple playbooks during the actions, and we also want
to have some kind of history.&lt;/p&gt;
&lt;p&gt;In order to do that, the easiest way to get a name is to concatenate the time
and the playbook name, something like:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;timestamp&lt;/em&gt;-&lt;em&gt;playbookname&lt;/em&gt;.json&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="use-systemd-journald-instead-of-files"&gt;
&lt;h4&gt;Use systemd/journald instead of files&lt;/h4&gt;
&lt;p&gt;One might want to use systemd/journald instead of plain files. While this
sounds appealing, there are multiple potential issues:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Sensitive data will be shown in the system’s journald, at hand of any other
user&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Journald has rate limitations and threshold, meaning we might hit them, and
therefore lose logs, or prevent other services to use journald for their
own logging&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;While we can configure a log service (rsyslog, syslog-ng, etc) in order to
output specific content to specific files, we will face access issues on
them&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Therefore, we shouldn’t use journald.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="does-it-meet-the-requirements"&gt;
&lt;h3&gt;Does it meet the requirements?&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;No service addition: yes - it’s only a change in the CLI, no new dependecy is
needed (tripleoclient already depends on validations-common, which depends on
validations-libs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No increase in operation time: this has to be proven with proper PoC and
metrics gathering/comparison.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Existing Tool: yes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Actively maintained: so far, yes - expected to be extended outside of TripleO&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;KISS: yes, based on the validations-libs and simple Ansible callback&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h2&gt;Alternatives&lt;/h2&gt;
&lt;section id="ara"&gt;
&lt;h3&gt;ARA&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://ara.recordsansible.org/"&gt;ARA Records Ansible&lt;/a&gt; provides some of the functionnalities we implemented in
the Validation Framework logging, but it lacks some of the wanted features,
such as&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CLI integration within tripleoclient&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Third-party service independency&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;plain file logging in order to scrap them with SOSReport or other tools&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ARA needs a DB backend - we could inject results in the existing galera DB, but
that might create some issues with the concurrent accesses happening during a
deploy for instance. Using sqlite is also an option, but it means new packages,
new file location to save, binary format and so on.&lt;/p&gt;
&lt;p&gt;It also needs some web server in order to show the reporting, meaning yet
another httpd configuration, and the need to access to it on the undercloud.&lt;/p&gt;
&lt;p&gt;Also, ARA being a whole service, it would require to deploy it, configure it,
and maintain it - plus ensure it is properly running before each action in
order to ensure it gets the logs.&lt;/p&gt;
&lt;p&gt;By default, ARA doesn’t affect the actual playbook output, while the goal of
this spec is mostly about it: provide a concise feedback to the operator, while
keeping the logs on disk, in files, with the ability to interact with them
through the CLI directly.&lt;/p&gt;
&lt;p&gt;In the end, ARA might be a solution, but it will require more work to get it
integrated, and, since the Triple UI has been deprecated, there isn’t real way
to integrate it in an existing UI tool.&lt;/p&gt;
&lt;section id="would-it-meet-the-requirements"&gt;
&lt;h4&gt;Would it meet the requirements?&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;No service addition: no, due to the “REST API” aspect. A service must answer
API calls&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No increase in operation time: probably yes, depending on the way ARA can
manage inputs queues. Since it’s also using a callback, we have to account
for the potential resources used by it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Existing tool: yes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Actively maintained: yes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;KISS: yes, but it adds new dependencies (DB backend, Web server, ARA service,
and so on)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note on the “new dependencies”: while ARA can be launched
&lt;a class="reference external" href="https://ara.readthedocs.io/en/latest/cli.html#ara-manage-generate"&gt;without any service&lt;/a&gt;, it seems to be only for devel purpose, according to the
informative note we can read on the documentation page:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;Good&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;small&lt;/span&gt; &lt;span class="n"&gt;scale&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="n"&gt;but&lt;/span&gt; &lt;span class="n"&gt;inefficient&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;contains&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;lot&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;small&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;
&lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;large&lt;/span&gt; &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Therefore, we shouldn’t use ARA.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="proposed-roadmap"&gt;
&lt;h2&gt;Proposed Roadmap&lt;/h2&gt;
&lt;p&gt;In Xena:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Ensure we have all the ABI capabilities within validations-libs in order to
set needed/wanted parameters for a different log location and file naming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Start to work on the ansible-runner calls so that it uses a tweaked callback,
using the validations-libs capabilities in order to get the direct feedback
as well as the formatted file in the right location&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h2&gt;Security Impact&lt;/h2&gt;
&lt;p&gt;As we’re going to store full ansible output on the disk, we must ensure log
location accesses are closed to any non-wanted user. As stated while talking
about the file location, the directory mode and ownership must be set so that
only the needed users can access its content (root + stack user)&lt;/p&gt;
&lt;p&gt;Once this is sorted out, no other security impact is to be expected - further
more, it will even make things more secure than now, since the current way
ansible is launched within tripleoclient puts an “ansible.log” file in the
operator home directory without any specific rights.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h2&gt;Upgrade Impact&lt;/h2&gt;
&lt;p&gt;Appart from ensuring the log location exists, there isn’t any major upgrade
impact. A doc update must be done in order to point to the log location, as
well as some messages within the CLI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="end-user-impact"&gt;
&lt;h2&gt;End User Impact&lt;/h2&gt;
&lt;p&gt;There are two impacts to the End User:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CLI output will be reworked in order to provide useful information (see
Direct Feedback above)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Log location will change a bit for the ansible part (see File Logging above)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h2&gt;Performance Impact&lt;/h2&gt;
&lt;p&gt;A limited impact is to be expected - but proper PoC with metrics must be
conducted to assess the actual change.&lt;/p&gt;
&lt;p&gt;Multiple deploys must be done, with different Overcloud design, in order to
see the actual impact alongside the number of nodes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="deployer-impact"&gt;
&lt;h2&gt;Deployer Impact&lt;/h2&gt;
&lt;p&gt;Same as End User Impact: CLI output will be changed, and the log location will
be updated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h2&gt;Developer Impact&lt;/h2&gt;
&lt;p&gt;The callback is enabled by default, but the Developer might want to disable it.
Proper doc should reflect this. No real impact in the end.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="contributors"&gt;
&lt;h3&gt;Contributors&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Cédric Jeanneret&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mathieu Bultel&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Modify validations-libs in order to provided the needed interface (shouldn’t
be really needed, the libs are already modular and should expose the wanted
interfaces and parameters)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a new callback in tripleo-ansible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure the log directory is created with the correct rights&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the ansible-runner calls to enable the callback by default&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure tripleoclient outputs status update on a regular basis while the logs
are being written in the right location&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update/create the needed documentations about the new logging location and
management&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
</description><pubDate>Thu, 04 Jun 2020 00:00:00 </pubDate></item><item><title>Enable TripleO to deploy Dell EMC PowerFlex software defined storage via Ansible</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/victoria/tripleo-powerflex-integration.html</link><description>
 
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem description&lt;/h2&gt;
&lt;p&gt;There is currently no automated way to deploy VxFlexOS from within TripleO.
Goal is to provide an ease of use at the time of deployment as well as during
lifecycle operations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-changes"&gt;
&lt;h2&gt;Proposed changes&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;VxFlexOS has been rebranded to PowerFlex.&lt;/p&gt;
&lt;p&gt;The deployer experience to stand up PowerFlex with TripleO should be the
following:&lt;/p&gt;
&lt;p&gt;The deployer chooses to deploy a role containing any of the PowerFlex services:
PowerflexMDM, PowerflexLIA, PowerflexSDS and PowerflexSDC.&lt;/p&gt;
&lt;p&gt;At least three new Overcloud roles should be defined such as:
- Controller with PowerFlex
- Compute with PowerFlex
- Storage with PowerFlex&lt;/p&gt;
&lt;p&gt;Custom roles definition are used to define which service will run on which
type of nodes. We’ll use this custom roles_data.yaml to deploy the overcloud.&lt;/p&gt;
&lt;p&gt;PowerFlex support for HCI, which combines compute and storage into a single
node, has been considered but will not be part of the first drop.&lt;/p&gt;
&lt;p&gt;The deployer provides the PowerFlex parameters as offered today in a Heat env
file.&lt;/p&gt;
&lt;p&gt;The deployer starts the deployment and gets an overcloud with PowerFlex and
appropriate services deployed on each node per its role.
Current code is available here. Still WIP.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/dell/tripleo-powerflex"&gt;https://github.com/dell/tripleo-powerflex&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The following files are created in
/usr/share/openstack-tripleo-heat-templates/deployment/powerflex-ansible :
- powerflex-base.yaml
- powerflex-lia.yaml
- powerflex-mdm.yaml
- powerflex-sdc.yaml
- powerflex-sds.yaml
All of these files are responsible of the configuration of each sevice. Each
service is based upon the powerflex-base.yaml template which calls the Ansible
playbook and triggers the deployment.&lt;/p&gt;
&lt;p&gt;The directory /usr/share/powerflex-ansible holds the Ansible playbook which
installs and configure PowerFlex.&lt;/p&gt;
&lt;p&gt;A new tripleo-ansible role is created in /usr/share/ansible/roles called
tripleo-powerflex-run-ansible which prepares the variables and triggers the
execution of the PowerFlex Ansible playbook.&lt;/p&gt;
&lt;p&gt;An environment name powerflex-ansible.yaml file is created in
/usr/share/openstack-tripleo-heat-emplates/environments/powerflex-ansible
and defines the resource registry mapping and additional parameters required by
the PowerFlex Ansible playbook.&lt;/p&gt;
&lt;p&gt;Ports which have to be opened are managed by TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="powerflex-deployment-with-tripleo-ansible"&gt;
&lt;h3&gt;PowerFlex deployment with TripleO Ansible&lt;/h3&gt;
&lt;p&gt;Proposal to create a TripleO Ansible playbook to deploy a PowerFlex system.&lt;/p&gt;
&lt;p&gt;We refer to a PowerFlex system as a set of services deployed on nodes on a
per-role basis.&lt;/p&gt;
&lt;p&gt;The playbook described here assumes the following:&lt;/p&gt;
&lt;p&gt;A deployer chooses to deploy PowerFlex and includes the following Overcloud
roles which installs the PowerFlex services based upon the mapping found in
THT’s roles_data.yaml:&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;Role       | Associated PowerFlex service             |&lt;/div&gt;
&lt;div class="line"&gt;———- | —————————————- |&lt;/div&gt;
&lt;div class="line"&gt;Controller | PowerflexMDM, PowerflexLIA, PowerflexSDC |&lt;/div&gt;
&lt;div class="line"&gt;Compute    | PowerflexLIA, PowerflexSDC               |&lt;/div&gt;
&lt;div class="line"&gt;Storage    | PowerflexLIA, PowerflexSDS               |&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The deployer chooses to include new Heat environment files which will be in THT
when this spec is implemented. An environment file will change the
implementation of any of the four services from the previous step.&lt;/p&gt;
&lt;p&gt;A new Ansible playbook is called during the deployment which triggers the
execution of the appropriate PowerFlex Ansible playbook.&lt;/p&gt;
&lt;p&gt;This can be identified as an cascading-ansible deployment.&lt;/p&gt;
&lt;p&gt;A separate Ansible playbook will be created for each goal described below:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Initial deployment of OpenStack and PowerFlex&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update and upgrade PowerFlex SW&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scaling up or down DayN operations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This proposal only refers to a single PowerFlex system deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="rpms-kernel-dependencies"&gt;
&lt;h3&gt;RPMS/Kernel dependencies&lt;/h3&gt;
&lt;p&gt;Virt-Customize will be used to inject the rpms into the overcloud-full-image for
new installations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="version-dependencies"&gt;
&lt;h3&gt;Version dependencies&lt;/h3&gt;
&lt;p&gt;Version control is handled outside current proposal. The staging area has the
PowerFlex packages specific to the OS version of overcloud image.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="ansible-playbook"&gt;
&lt;h2&gt;Ansible playbook&lt;/h2&gt;
&lt;section id="initial-deployment-of-openstack-and-powerflex"&gt;
&lt;h3&gt;Initial deployment of OpenStack and PowerFlex&lt;/h3&gt;
&lt;p&gt;The sequence of events for this new Ansible playbook to be triggered during
initial deployment with TripleO follows:&lt;/p&gt;
&lt;p&gt;1. Define the Overcloud on the Undercloud in Heat. This includes the Heat
parameters that are related to PowerFlex which will later be passed to
powerflex-ansible via TripleO Ansible playbook.&lt;/p&gt;
&lt;p&gt;2. Run &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt; with default PowerFlex options and include
a new Heat environment file to make the implementation of the service
deployment use powerflex-ansible.&lt;/p&gt;
&lt;p&gt;3. The undercloud assembles and uploads the deployment plan to the undercloud
Swift.&lt;/p&gt;
&lt;ol class="arabic simple" start="4"&gt;
&lt;li&gt;&lt;p&gt;TripleO starts to deploy the Overcloud and interfaces with Heat accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;5. A point in the deployment is reached where the Overcloud nodes are imaged,
booted, and networked. At that point the undercloud has access to the
provisioning or management IPs of the Overcloud nodes.&lt;/p&gt;
&lt;p&gt;6. The TripleO Ansible playbook responsible to Deploy PowerFlex with any of
the four PowerFlex services, including PowerflexMDM, PowerflexLIA, PowerflexSDS
and PowerflexSDC.&lt;/p&gt;
&lt;p&gt;7. The servers which host PowerFlex services have their relevant firewall ports
opened according to the needs of their service, e.g. the PowerflexMDM are
configured to accept traffic on TCP port 9011 and 6611.&lt;/p&gt;
&lt;p&gt;8. A new Heat environment file which defines additional parameters that we want
to override is passed to the TripleO Ansible playbook.&lt;/p&gt;
&lt;p&gt;9. The TripleO Ansible playbook translates these parameters so that they match
the parameters that powerflex-ansible expects. The translation entails building
an argument list that may be passed to the playbook by calling
&lt;cite&gt;ansible-playbook –extra-vars&lt;/cite&gt;. An alternative location for the
/usr/share/powerflex-ansible playbook is possible via an argument. No
playbooks are run yet at this stage.&lt;/p&gt;
&lt;p&gt;10. The TripleO Ansible playbook is called and passed the list
of parameters as described earlier. A dynamic Ansible inventory is used with the
&lt;cite&gt;-i&lt;/cite&gt; option. In order for powerflex-ansible to work there must be a group called
&lt;cite&gt;[mdms]&lt;/cite&gt;, ‘[tbs]’, ‘[sdss]’ and ‘[sdcs]’ in the inventory.&lt;/p&gt;
&lt;p&gt;11. The TripleO Ansible playbook starts the PowerFlex install using the
powerflex-ansible set of playbooks&lt;/p&gt;
&lt;/section&gt;
&lt;section id="update-upgrade-powerflex-sw"&gt;
&lt;h3&gt;Update/Upgrade PowerFlex SW&lt;/h3&gt;
&lt;p&gt;TBD&lt;/p&gt;
&lt;/section&gt;
&lt;section id="scaling-up-down"&gt;
&lt;h3&gt;Scaling up/down&lt;/h3&gt;
&lt;p&gt;This implementation supports the add or remove of SDS and/or SDC at any moment
(Day+N operations) using the same deployment method.&lt;/p&gt;
&lt;p&gt;1. The deployer chooses which type of node he wants to add or remove from the
Powerflex system.&lt;/p&gt;
&lt;p&gt;2. The deployer launches an update on the Overcloud which will bring up or down
the nodes to add/remove.&lt;/p&gt;
&lt;ol class="arabic simple" start="3"&gt;
&lt;li&gt;&lt;p&gt;The nodes will be added or removed from the Overcloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The SDS and SDC SW will be added or removed from the PowerFlex system.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;5. Storage capacity will be updated consequently.
For Scaling down operation, it will succeed only if:
- the minimum of 3 SDS nodes remains
- the free storage capacity available is enough for rebalancing the data&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="powerflex-services-breakdown"&gt;
&lt;h2&gt;PowerFlex services breakdown&lt;/h2&gt;
&lt;p&gt;The PowerFlex system is broken down into multiple components, each of these have
to be installed on specific node types.&lt;/p&gt;
&lt;section id="non-hci-model"&gt;
&lt;h3&gt;Non HCI model&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Controllers will host the PowerflexLIA, PowerflexMDM and PowerflexSDC (Glance)
components. A minimum of 3 MDMs is required.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Computes will host the PowerflexLIA and PowerflexSDC as they will be
responsible for accessing volumes. There is no minimum.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage will host the PowerflexLIA and PowerflexSDS as disks will be presented
as backend.  A minimum of 3 SDS is required. A minimum of 1 disk per SDS is
also required to connect the SDS.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="hci-model"&gt;
&lt;h3&gt;HCI model&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Controllers will host the PowerflexLIA, PowerflexMDM and PowerflexSDC (Glance)
components. A minimum of 3 MDMs is required.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compute HCI will host the PowerflexLIA and PowerflexSDC as they will be
responsible for accessing volumes and the PowerflexSDS as disks will be
presented as backend.  A minimum of 3 SDS is required. A minimum of 1 disk per
SDS is also required to connect the SDS.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h2&gt;Security impact&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A new SSH key pair will be created on the undercloud.
The public key of this pair will be installed in the heat-admin user’s
authorized_keys file on all Overcloud nodes which will be MDMs, SDSs, or SDCs.
This process will follow the same pattern used to create the SSH keys used for
TripleO validations so nothing new would happen in that respect; just another
instance on the same type of process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Additional firewall configuration need to include all TCP/UDP ports needed by
Powerflex services according to the following:
| Overcloud role | PowerFlex Service | Ports                  |
| ————– | —————– | ———————- |
| Controller     | LIA, SDC, SDS     | 9099, 7072, 6611, 9011 |
| Compute        | LIA, SDC          | 9099                   |
| Storage        | LIA, SDS          | 9099, 7072             |&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kernel modules package like scini.ko will be installed depending of the
version of the operating system of the overcloud node.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Question:  Will there be any SELinux change needed for IP ports that vxflexOS
is using?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h2&gt;Performance Impact&lt;/h2&gt;
&lt;p&gt;The following applies to the undercloud:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;TripleO Ansible will need to run an additional playbook&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Wed, 29 Apr 2020 00:00:00 </pubDate></item><item><title>Simple Container Generation</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/victoria/simple-container-generation.html</link><description>
 
&lt;p&gt;Simple container generation is an initiative to reduce complexity in the
TripleO container build, deployment, and distribution process by reducing the
size and scope of the TripleO container build tools.&lt;/p&gt;
&lt;p&gt;The primary objective of this initiative is to replace Kolla, and our
associated Kolla customization tools, as the selected container generation
tool-kit. The TripleO community has long desired an easier solution for
deployers and integrators alike and this initiative is making that desire a
reality.&lt;/p&gt;
&lt;p&gt;The Simple container generation initiative is wanting to pivot from a
tool-chain mired between a foundational component of Kolla-Ansible and a
general purpose container build system, to a vertically integrated solution
that is only constructing what TripleO needs, in a minimally invasive, and
simple to understand way.&lt;/p&gt;
&lt;p&gt;&lt;a class="footnote-reference brackets" href="#f3" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;TripleO currently leverages Kolla to produce container images. These images are
built for Kolla-Ansible using an opinionated build process which has general
purpose features. While our current images work, they’re large and not well
suited for the TripleO use-case, especially in distributed data-centers. The
issue of container complexity and size impacts three major groups, deployers,
third party integrators, and maintainers. As the project is aiming to simplify
interactions across the stack, the container life cycle and build process has
been identified as something that needs to evolve. The TripleO project needs
something vertically integrated which produces smaller images, that are easier
to maintain, with far fewer gyrations required to tailor images to our needs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Implement a container file generation role, and a set of statically defined
override variable files which are used to generate our required
container files. &lt;a class="footnote-reference brackets" href="#f2" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;section id="layering"&gt;
&lt;h4&gt;Layering&lt;/h4&gt;
&lt;div class="highlight-text notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;tripleo-base+---+
                |
                |
                +---+-openstack-${SERVICE}-1-common-+--&amp;gt;openstack-${SERVICE}-1-a
                    |                               |
                    |                               +--&amp;gt;openstack-${SERVICE}-1-b
                    |                               |
                    |                               +--&amp;gt;openstack-${SERVICE}-1-c
                    +--&amp;gt;openstack-${SERVICE}-2
                    |
                    +--&amp;gt;ancillary-${SERVICE}-1
                    |
                    +--&amp;gt;ancillary-${SERVICE}-2
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="user-experience"&gt;
&lt;h4&gt;User Experience&lt;/h4&gt;
&lt;p&gt;Building the standard set of images will be done through a simple command line
interface using the TripleO python client.&lt;/p&gt;
&lt;div class="highlight-shell notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;$ openstack tripleo container image build &lt;span class="o"&gt;[&lt;/span&gt;opts&lt;span class="o"&gt;]&lt;/span&gt; &amp;lt;args&amp;gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This simple sub-command will provide users the ability to construct images as
needed, generate container files, and debug runtime issues.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="cli-options"&gt;
&lt;h4&gt;CLI Options&lt;/h4&gt;
&lt;p&gt;The python TripleO client options for the new container image build entry point.&lt;/p&gt;
&lt;table class="docutils align-default"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Option&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Default&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;config-file&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;$PATH/overcloud_containers.yaml&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Configuration file setting the list of containers to build.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;exclude&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;[]&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Container type exclude. Can be specified multiple times.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;work-dir&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;/tmp/container-builds&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Container builds directory, storing the container files and
logs for each image and its dependencies.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;skip-push&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;False&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Skip pushing images to the registry&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;skip-build&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;False&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Only generates container files without producing a local build.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;base&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;centos&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Base image name.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;type&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;binary&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Image type.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tag&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;latest&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Image tag.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;registry&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;localhost&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Container registry URL.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;namespace&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;tripleomaster&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Container registry namespace.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;volume&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;[]&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Container bind mount used when building the image. Should be
specified multiple times if multiple volumes.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/section&gt;
&lt;section id="container-image-build-tools"&gt;
&lt;h4&gt;Container Image Build Tools&lt;/h4&gt;
&lt;p&gt;Container images will be built using &lt;a class="reference external" href="https://buildah.io"&gt;Buildah&lt;/a&gt;, The required Buildah
functionality will leverage &lt;cite&gt;BuildahBuilder&lt;/cite&gt; via &lt;cite&gt;python-tripleoclient&lt;/cite&gt;
integration and be exposed though CLI options.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="image-layout"&gt;
&lt;h4&gt;Image layout&lt;/h4&gt;
&lt;p&gt;Each image will have its own YAML file which has access to the following
parameters. Each YAML file will have one required parameter (tcib_from for the
source image to build from) and optional parameters.&lt;/p&gt;
&lt;table class="docutils align-default"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Option&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Default&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Type&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Required&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_path&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;cite&gt;{{ lookup(‘env’, ‘HOME’) }}&lt;/cite&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;String&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Path to generated the container file(s) for a given
image.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_args&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Dict[str, str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Single level &lt;cite&gt;key:value&lt;/cite&gt; pairs. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#arg"&gt;arg&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_from&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;cite&gt;centos:8&lt;/cite&gt;&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Str&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;True&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Container image to deploy from. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#from"&gt;from&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_labels&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Dict[str, str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Single level &lt;cite&gt;key:value&lt;/cite&gt; pairs. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#label"&gt;label&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_envs&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Dict[str, str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Single level &lt;cite&gt;key:value&lt;/cite&gt; pairs. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#env"&gt;env&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_onbuilds&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;List[str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;&amp;lt;item&amp;gt;=String. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#onbuild"&gt;onbuild&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_volumes&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;List[str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;&amp;lt;item&amp;gt;=String. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#volume"&gt;volume&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_workdir&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Str&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#workdir"&gt;workdir&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_adds&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;List[str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;&amp;lt;item&amp;gt;=String. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#add"&gt;add&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_copies&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;List[str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;&amp;lt;item&amp;gt;=String. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#copy"&gt;copy&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_exposes&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;List[str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;&amp;lt;item&amp;gt;=String. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#expose"&gt;expose&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_user&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Str&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#user"&gt;user&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_shell&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Str&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#shell"&gt;shell&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_runs&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;List[str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;&amp;lt;item&amp;gt;=String. Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#run"&gt;run&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_healthcheck&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Str&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#healthcheck"&gt;healthcheck&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_stopsignal&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Str&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#stopsignal"&gt;stopsignal&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_entrypoint&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Str&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#entrypoint"&gt;entrypoint&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_cmd&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Str&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Implements &lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#cmd"&gt;cmd&lt;/a&gt;.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_actions&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;List[Dict[str, str]]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Each item is a Single level Dictionary &lt;cite&gt;key:value&lt;/cite&gt;
pairs. Allows for arbitrary verbs which maintains
ordering.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tcib_gather_files&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;List[str]&lt;/p&gt;&lt;/td&gt;
&lt;td/&gt;
&lt;td&gt;&lt;p&gt;Each item is a String. Collects files from the
host and stores them in the build directory.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Application packages are sorted within each container configuration file.
This provides a programmatic interface to derive package sets, allows
overrides, and is easily visualized. While the package option is not
processes by the &lt;cite&gt;tripleo_container_image_build&lt;/cite&gt; role, it will serve as a
standard within our templates.&lt;/p&gt;
&lt;table class="docutils align-default"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Option&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tcib_packages&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Dictionary of packages to install.&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;common&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;openstack-${SERVICE}-common&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;distro-1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;common&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;openstack-${SERVICE}-proprietary&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;x86_64&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;$dep-x86_64&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;power&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;$dep-power&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;distro-2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;common&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;openstack-${SERVICE}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;$dep&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This option is then captured and processed by a simple &lt;cite&gt;RUN&lt;/cite&gt; action.&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;tcib_actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"dnf&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;install&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tcib_packages['common']&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tcib_packages[ansible_distribution][ansible_architecture]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="example-container-variable-file"&gt;
&lt;h4&gt;Example Container Variable File&lt;/h4&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;tcib_from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ubi8&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;lookup('env',&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'HOME')&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}/example-image"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_labels&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;maintainer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;MaintainerX&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_entrypoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;dumb-init --single-child --&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_stopsignal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;SIGTERM&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_envs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;LANG&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;en_US.UTF-8&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_runs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;mkdir -p /etc/ssh &amp;amp;&amp;amp; touch /etc/ssh/ssh_known_host&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_copies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/etc/hosts /opt/hosts&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_gather_files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/etc&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_packages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;common&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;curl&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;centos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;x86_64&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;wget&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="nt"&gt;tcib_actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"dnf&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;install&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tcib_packages['common']&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tcib_packages[ansible_distribution][ansible_architecture]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/etc/resolv.conf /resolv.conf&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"/bin/bash"&lt;/span&gt;&lt;span class="p p-Indicator"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"-c"&lt;/span&gt;&lt;span class="p p-Indicator"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"echo&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;world"&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="container-file-structure"&gt;
&lt;h4&gt;Container File Structure&lt;/h4&gt;
&lt;p&gt;The generated container file(s) will follow a simple directory structure
which provide an easy way to view, and understand, build relationships and
dependencies throughout the stack.&lt;/p&gt;
&lt;div class="highlight-shell notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;tripleo-base/&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERFILE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
tripleo-base/ancillary-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-1/&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERFILE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
tripleo-base/ancillary-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-2/&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERFILE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
tripleo-base/openstack-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-1-common/&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERFILE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
tripleo-base/openstack-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-1-common/openstack-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-1-a/&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERFILE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
tripleo-base/openstack-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-1-common/openstack-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-1-b/&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERFILE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
tripleo-base/openstack-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-1-common/openstack-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-1-c/&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERFILE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
tripleo-base/openstack-&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;-2/&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERFILE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Use Ansible Bender&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ansible Bender was evaluated as a tool which could help to build the container
images. However it has not been productized downstream; which would make it
difficult to consume. It doesn’t generate Dockerfiles and there is a strong
dependency on Bender tool; the container image build process would therefore be
more difficult to do in a standalone environment where Bender isn’t available.
&lt;a class="footnote-reference brackets" href="#f1" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Leave the container image build process untouched.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We could leave the container image generate process untouched. This keeps us a
consumer of Kolla and requires we maintain our complex ancillary tooling to
ensure Kolla containers work for TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;While security is not a primary virtue in the simple container generation
initiative, security will be improved by moving to simplified containers. If
the simple container generation initiative is ratified, all containers used
within TripleO will be vertically integrated into the stack, making it possible
to easily audit the build tools and all applications, services, and files
installed into our containerized runtimes. With simplification we’ll improve
the ease of understanding and transparency which makes our project more
sustainable, thereby more secure. The proposed solution must provide layers
where we know what command has been run exactly; so we can quickly figure out
how an image was built.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;There is no upgrade impact because the new container images will provide
feature parity with the previous ones; they will have the same or similar
injected scripts that are used when the containers start.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;We should expect better performance out of our containers, as they will be
smaller. While the runtime will act the same, the software delivery will be
faster as the size of each container will smaller, with better constructed
layers. Smaller containers will decrease the mean time to ready which will have
a positive performance impact and generally improve the user experience.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;The simplified container generation initiative will massively help third party
integrators. With simplified container build tools we will be able to easily
articulate requirements to folks looking to build on-top of TripleO. Our
tool-chain will be capable of bootstrapping applications where required, and
simple enough to integrate with a wide variety of custom applications
constructed in bespoke formats.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;In the first phase, there won’t be any developer impact because the produced
images will be providing the same base layers as before. For example, they will
contain all the Kolla scripts that are required to merge configuration files or
initialize the container at startup.&lt;/p&gt;
&lt;p&gt;These scripts will be injected in the container images for backward
compatibility:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;kolla_extend_start&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;set_configs.py&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;start.sh&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;copy_cacerts.sh&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;httpd_setup.sh&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a second phase, we will simplify these scripts to remove what isn’t needed
by TripleO. The interface in the composable services will likely evolve over
time. For example kolla_config will become container_config. There is no plan
at this time to rewrite the configuration file merge logic.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Cloudnull&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EmilienM&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;section id="first-phase"&gt;
&lt;h4&gt;First phase&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Ansible role to generate container file(s) - &lt;a class="reference external" href="https://review.opendev.org/#/c/722557"&gt;https://review.opendev.org/#/c/722557&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Container images layouts - &lt;a class="reference external" href="https://review.opendev.org/#/c/722486"&gt;https://review.opendev.org/#/c/722486&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deprecate “openstack overcloud container image build”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement “openstack tripleo container image build” which will reuse the
&lt;cite&gt;BuildahBuilder&lt;/cite&gt; and the same logic as the deprecated command but without Kolla.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Build new images and publish them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Switch the upstream CI to use the new images.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Second phase:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Simplifying the injected scripts to only do what we need in TripleO.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rename the configuration interfaces in TripleO Heat Templates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;The tooling will be in existing repositories so there is no new dependency. It
will mainly be in tripleo-ansible, tripleo-common, python-tripleoclient and
tripleo-heat-templates. Like before, Buildah will be required to build the
images.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The tripleo-build-containers-centos-8 job will be switched to be using
the new “openstack tripleo container image build” command.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A molecule job will exercise the container image build process using
the new role.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some end-to-end job will also be investigated to build and deploy
a container into a running deployment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Much of the documentation impact will be focused on cleanup of the existing
documentation which references Kolla, and the creation of documentation that
highlights the use of the vertically integrated stack.&lt;/p&gt;
&lt;p&gt;Since the changes should be transparent for the end-users who just pull images
without rebuilding it, the manuals will still be updated with the new command
and options if anyone wants to build the images themselves.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="f1" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/#/c/722136/"&gt;https://review.opendev.org/#/c/722136/&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="f2" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/#/c/722557/"&gt;https://review.opendev.org/#/c/722557/&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="f3" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/simplified-containers"&gt;https://blueprints.launchpad.net/tripleo/+spec/simplified-containers&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Mon, 27 Apr 2020 00:00:00 </pubDate></item><item><title>TripleO Ceph</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ceph.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A light Ansible framework for TripleO integration with Ceph clusters
deployed with &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; and managed with Ceph &lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt;.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Starting in the Octopus release, Ceph has its own day1 tool called
&lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; and it’s own day2 tool called &lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt; which will
replace &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt;. What should TripleO’s Ceph integration
do about this? We currently provide the following user experience:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;Describe an OpenStack deployment, which includes Ceph, and TripleO
will “make it so”&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The above has been true for TripleO since Kilo and should
continue. TripleO should also continue hyper-converged support
(collocation of OpenStack and Ceph containers). There is sufficient
value in both of these (one tool and hyper-convergence) to justify
this project. At the same time we want to deploy Ceph in a way
consistent with the way the Ceph project is moving and decouple the
complexity of day2 management of Ceph from TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Modify tripleo-ansible, tripleo-heat-templates, and
python-tripleoclient in support of the following goals:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Provide Ansible roles which deploy Ceph by calling &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; and Ceph
orchestrator&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Focus on the day1 problem for Ceph RBD, RGW, CephFS, and Dashboard
deployment by leveraging &lt;cite&gt;cephadm bootstrap –apply-spec&lt;/cite&gt; as
described in Ceph issue &lt;a class="reference external" href="https://tracker.ceph.com/issues/44873"&gt;44873&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;By default, day2 Ceph operations should be done directly with Ceph
&lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt; or Ceph Dashboard and not by running &lt;cite&gt;openstack
overcloud deploy&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TripleO stack updates do not trigger the new Ansible roles
introduced by this spec.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide an opinionated Ceph installation based on parameters from
TripleO (including hardware details from Ironic)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure cephx keyrings and pools for OpenStack on a deployed Ceph
cluster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support collocation (hyperconvergence) of OpenStack/Ceph containers
on same host
- &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; reconciliation loop must not break OpenStack configuration
- TripleO configuration updates must not break Ceph configuration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide Ceph integration but maximize orthogonality between
OpenStack and Ceph&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The implementation of the TripleO CephClient service during the W
cycle is covered in a different spec in review &lt;a class="reference external" href="https://review.opendev.org/#/c/757644"&gt;757644&lt;/a&gt;. This work will
be merged before the work described in this spec as it will be
compatible with the current Ceph deployment methods. It will also be
compatible with the future deployment methods described in this spec.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="integration-points"&gt;
&lt;h3&gt;Integration Points&lt;/h3&gt;
&lt;p&gt;The default deployment method of OpenStack/Ceph for TripleO Victoria
is the following 2-step-process:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Deploy nodes with &lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html"&gt;metalsmith&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy OpenStack and Ceph with &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The Ceph portion of item 2 uses external_deploy_steps_tasks to call
ceph-ansible by using the tripleo-ansible roles: tripleo_ceph_common,
tripleo_ceph_uuid, tripleo_ceph_work_dir, tripleo_ceph_run_ansible.&lt;/p&gt;
&lt;p&gt;The ultimate goal for this spec is to support the following
4-step-process:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Deploy the hardware with &lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html"&gt;metalsmith&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure networking (including storage networks)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy Ceph with the roles and interface provided by tripleo-ansible/python-tripleoclient&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy OpenStack with &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Item 2 above depends on the spec for network data v2 format described
in review &lt;a class="reference external" href="https://review.opendev.org/#/c/752437"&gt;752437&lt;/a&gt; and a subsequent network-related feature which moves
port management out of Heat, and supports applying network
configuration prior to Heat stack deployment described in review
&lt;a class="reference external" href="https://review.opendev.org/#/c/760536"&gt;760536&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Item 3 above is the focus of this spec but it is not necessarily
the only integration point. If it is not possible to configure the
storage networks prior to deploying OpenStack, then the new method
of Ceph deployment will still happen via external_deploy_steps_tasks
as it currently does in Victoria via the 2-step-process. Another way
to say this is that Ceph may be deployed &lt;em&gt;during&lt;/em&gt; the overcloud
deployment in the 2-step process or Ceph may be deployed &lt;em&gt;before&lt;/em&gt; the
overcloud during the 4-step process; in either case we will change how
Ceph is deployed.&lt;/p&gt;
&lt;p&gt;The benefit of deploying Ceph before deploying the overcloud is that
the complexity of the Ceph deployment is decoupled from the complexity
of the OpenStack deployment. Even if Ceph is deployed before the
overcloud, its deployment remains a part of TripleO the same way that
the bare metal deployment remains a part of TripleO; even though a
separate tool, e.g. &lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html"&gt;metalsmith&lt;/a&gt; or &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; is used to deploy the
resources which are not deployed when &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt;
is run.&lt;/p&gt;
&lt;p&gt;Additional details on how Ceph is deployed before vs during the
overcloud deployment are covered in the implementation section.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We could ask deployers to do this:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Deploy hardware and configure networking&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; and &lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt; directly to configure that hardware
with Ceph and create OpenStack pools accessible by CephX clients&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use TripleO to configure OpenStack&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We have completed a POC of the above using Ussuri and config-download
tags to only run certain steps but would prefer to offer an option to
automate the Ceph deployment. The TripleO project has already ensured
that the move from one to three is automated and requires only two
commands because the tripleo python client now has an option to call
&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html"&gt;metalsmith&lt;/a&gt;. The alternative is to not automate step two, but that is
user unfriendly.&lt;/p&gt;
&lt;p&gt;Another alternative is to continue using &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; as we do today.
However, even though &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; can deploy Octopus today and will
continue to support deployment of Luminous and Nautilus, the project
has a &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/cephadm-adopt.yml"&gt;cephadm-adopt&lt;/a&gt; playbook for converting Ceph clusters that it has
deployed to mangement by &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; &lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt; so seems to be moving
away from true Octopus support. &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; has lot of code and day2
support; porting ceph-ansible itself to &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; or &lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt; is
more work than completing this project with a smaller scope and looser
coupling.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; tool is imperative and requires SSH access to the Ceph
cluster nodes in order to execute remote commands and deploy the
specified services. This command will need to be installed on one of
the overcloud nodes which will host the composable CephMon service.
From the &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; point of view, that node will be a bootstrap node
on which the Ceph cluster is created.&lt;/p&gt;
&lt;p&gt;For this reason the Ceph cluster nodes must be SSH accessible and
provide a user with root privileges to perform some tasks. For
example, the standard way to add a new host when using &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; is to
run the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;ssh-copy-id -f -i /etc/ceph/ceph.pub root@*&amp;lt;new-host&amp;gt;*&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;ceph orch host add *&amp;lt;new-host&amp;gt;*&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The TripleO deployment flow, and in particular config-download,
already provides the key elements to properly configure and run
the two actions described above, hence the impact from a security
point of view is unchanged compared to the previous deployment model.&lt;/p&gt;
&lt;p&gt;We will create a user like ceph-admin using the same process
config-download uses to create the tripleo-admin user and then
&lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; will use this user when it runs commands to add other
hosts.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;Ceph Nautilus clusters are still managed by ceph-ansible, and &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt;
can be enabled, as the new, default backend, once the Octopus release
is reached. Therefore, starting from Nautilus, two main steps are
identified in the upgrade process:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Upgrade the cluster using &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; &lt;cite&gt;rolling_update&lt;/cite&gt;:
&lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; should provide, as already done in the past, a rolling
update playbook that can be executed to upgrade all the services to
the Octopus release&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migrate the existing cluster to cephadm/orchestrator: when all the
services are updated to Octopus &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/cephadm-adopt.yml"&gt;cephadm-adopt&lt;/a&gt; will be executed as
an additional step&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;New Ceph Octopus deployed clusters will use &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; and ceph
&lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt; by default, and the future upgrade path will be provided
by &lt;a class="reference external" href="https://docs.ceph.com/docs/master/cephadm/upgrade"&gt;cephadm_upgrade&lt;/a&gt;, which will be able to run, stop and resume all
the Ceph upgrade phases. At that point day2 ceph operations will need
to be carried out directly with ceph orchestrator. Thus, it will no
longer be necessary to include the
&lt;cite&gt;tripleo-heat-templates/environments/ceph-ansible/*&lt;/cite&gt; files in the
&lt;cite&gt;openstack overcloud deploy&lt;/cite&gt; command with the exception of the Ceph
client configuration as described in review &lt;a class="reference external" href="https://review.opendev.org/#/c/757644"&gt;757644&lt;/a&gt;, which will have a
new environment file.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;The Upgrade process for future releases can be subject of slight
modifications according to the OpenStack requirements.&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The main benefit from the operator perspective is the ability to take
advantage of the clear separation between the deployment phase and
day2 operations as well as the separation between the Ceph deployment
and the OpenStack deployment. At the same time TripleO can still
address all the deployment phase operations with a single tool but
leave and rely on &lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt; for what concerns day2 tasks.&lt;/p&gt;
&lt;p&gt;Many common tasks can now be performed the same way regardless of if
the Ceph cluster is internal (deployed by) or external to TripleO.
The operator can use the &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; and &lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt; tools which will
be accessible from one of the Ceph cluster monitor nodes.&lt;/p&gt;
&lt;p&gt;For instance, since &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; maintains the status of the cluster, the
operator is now able to perform the following tasks without interacting
with TripleO at all:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Monitor replacement&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OSD replacement (if a hardware change is necessary then Ironic
might be involved)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Even though &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; standalone, when combined with Ceph
&lt;a class="reference external" href="https://docs.ceph.com/en/latest/mgr/orchestrator/"&gt;orchestrator&lt;/a&gt;, should support all the commands required to the
carry out day2 operations, our plan is for tripleo-ceph to
continue to manage and orchestrate other actions that can
be taken by an operator when TripleO should be involved. E.g.
a CephStorage node is added as a scale-up operation, then
the tripleo-ceph Ansible roles should make calls to add the OSDs.&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Stack updates will not trigger Ceph tools so “OpenStack only” changes
won’t be delayed by Ceph operations. Ceph client configuration will
take less time though this benefit is covered in review &lt;a class="reference external" href="https://review.opendev.org/#/c/757644"&gt;757644&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Like ceph-ansible, &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; is distributed as an RPM and can be
installed from Ceph repositories. However, since the deployment
approach is changed and &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; requires a Ceph monitor node to
bootstrap a minimal cluster, we would like to install the &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt;
RPM on the overcloud image. As of today this RPM is approximately 46K
and we expect this to simplify the installation process. When &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt;
bootstraps the first Ceph monitor (on the first Controller node by
default) it will download the necessary Ceph containers. To contrast
this proposal with the current Ceph integration, &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; needs
to be installed on the undercloud and it then manages the download of
Ceph containers to overcloud nodes. In the case of both &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; and
ceph-ansible, no other package changes are needed for the overcloud
nodes as both tools run Ceph in containers.&lt;/p&gt;
&lt;p&gt;This change affects all TripleO users who deploy an Overcloud which
interfaces with Ceph. Any TripleO users who does not interface with
Ceph will not be directly impacted by this project.&lt;/p&gt;
&lt;p&gt;TripleO users who currently use
&lt;cite&gt;environments/ceph-ansible/ceph-ansible.yaml&lt;/cite&gt; in order to have their
overcloud deploy an internal Ceph cluster will need to migrate to the
new method when deploying W. This file and others will deprecated as
described in more detail below.&lt;/p&gt;
&lt;p&gt;The proposed changes do not take immediate effect after they are
merged because both the &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; and &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; interfaces will
exist intree concurrently.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;How Ceph is deployed could change for anyone maintaining TripleO code
for OpenStack services which use Ceph. In theory there should be no
change as the CephClient service will still configure the Ceph
configuration and Ceph key files in the same locations. Those
developers will just need to switch to the new interfaces when they
are stable.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;How configuration data is passed to the new tooling when Ceph is
deployed &lt;em&gt;before&lt;/em&gt; or &lt;em&gt;during&lt;/em&gt; the overcloud deployment, as described
in the Integration Points section of the beginning of this spec, will
be covered in more detail in this section.&lt;/p&gt;
&lt;section id="deprecations"&gt;
&lt;h3&gt;Deprecations&lt;/h3&gt;
&lt;p&gt;Files in &lt;cite&gt;tripleo-heat-templates/environments/ceph-ansible/*&lt;/cite&gt; and
&lt;cite&gt;tripleo-heat-templates/deployment/ceph-ansible/*&lt;/cite&gt; will be deprecated
in W and removed in X. They will be obsoleted by the new THT
parameters covered in the next section with the exception of
&lt;cite&gt;ceph-ansible/ceph-ansible-external.yaml&lt;/cite&gt; which will be replaced by
&lt;cite&gt;environments/ceph-client.yaml&lt;/cite&gt; as described in review &lt;a class="reference external" href="https://review.opendev.org/#/c/757644"&gt;757644&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The following tripleo-ansible roles will be deprecated at the start
of W: tripleo_ceph_common, tripleo_ceph_uuid, tripleo_ceph_work_dir,
and tripleo_ceph_run_ansible. The ceph_client role will not be
deprecated but it will be re-implemented as described in review
&lt;a class="reference external" href="https://review.opendev.org/#/c/757644"&gt;757644&lt;/a&gt;. New roles will be introduced to tripleo-ansible to replace
them.&lt;/p&gt;
&lt;p&gt;Until the project described here is complete during X we will
continue to maintain the deprecated &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; roles and
Heat templates for the duration of W and so it is likely that during
one release we will have intree support both &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; and
&lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="new-tht-templates"&gt;
&lt;h3&gt;New THT Templates&lt;/h3&gt;
&lt;p&gt;Not all THT configuration for Ceph can be removed. The firewall is
still configured based on THT as descrbed in the next section and THT
also controls which composable service is deployed and where. The
following new files will be created in
&lt;cite&gt;tripleo-heat-templates/environments/&lt;/cite&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;cephadm.yaml: triggers new cephadm Ansible roles until &lt;cite&gt;openstack
overcloud ceph …&lt;/cite&gt; makes it unnecessary. Contains the paths to the
files described in the Ceph End State Definition YAML Input section.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ceph-rbd.yaml: RBD firewall ports, pools and cephx key defaults&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ceph-rgw.yaml: RGW firewall ports, pools and cephx key defaults&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ceph-mds.yaml: MDS firewall ports, pools and cephx key defaults&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ceph-dashboard.yaml: defaults for Ceph Dashboard firewall ports&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of the above (except cephadm.yaml) will result in the appropriate
firewall ports being opened as well as a new idempotent Ansible role
connecting to the Ceph cluster in order to create the Ceph pools and
cephx keys to access those pools. Which ports, pools and keys are
created will depend on which files are included. E.g. if the deployer
ran &lt;cite&gt;openstack overcloud deploy … -e ceph-rbd.yaml -e cep-rgw.yaml&lt;/cite&gt;
then the ports, pools and cephx keys would be configured for Nova,
Cinder, and Glance to use Ceph RBD and RGW would be configured with
Keystone, but no firewall ports, pools and keys for the MDS service
would be created and the firewall would not be opened for the Ceph
dashboard.&lt;/p&gt;
&lt;p&gt;None of the above files, except cephadm.yaml, will result in Ceph
itself being deployed and none of the parameters needed to deploy Ceph
itself will be in the above files. E.g. PG numbers and OSD devices
will not be defined in THT anymore. Instead the parameters which are
needed to deploy Ceph itself will be in tripleo_ceph_config.yaml as
described in the Ceph End State Definition YAML Input section and
cephadm.yaml will only contain references to those files.&lt;/p&gt;
&lt;p&gt;The cephx keys and pools, created as described above, will result in
output data which looks like the following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;pools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;volumes&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;vms&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;backups&lt;/span&gt;
&lt;span class="n"&gt;openstack_keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;caps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
  &lt;span class="n"&gt;mon&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="n"&gt;rbd&lt;/span&gt;
  &lt;span class="n"&gt;osd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'osd: profile rbd pool=volumes, profile rbd pool=backups,&lt;/span&gt;
       &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="n"&gt;rbd&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="n"&gt;rbd&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;
  &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AQCwmeRcAAAAABAA6SQU&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;bGqFjlfLro5KxrB1Q&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;
  &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'0600'&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The above can be written to a file, e.g. ceph_client.yaml, and passed
as input to the the new ceph client role described in review &lt;a class="reference external" href="https://review.opendev.org/#/c/757644"&gt;757644&lt;/a&gt;
(along with the ceph_data.yaml file produced as output as described in
Ceph End State Definition YAML Output).&lt;/p&gt;
&lt;p&gt;In DCN deployments this type of information is extracted from the Heat
stack with &lt;cite&gt;overcloud export ceph&lt;/cite&gt;. When the new method of deployment
is used this information can come directly from each genereated yaml
file (e.g. ceph_data.yaml and ceph_client.yaml) per Ceph cluster.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="firewall"&gt;
&lt;h3&gt;Firewall&lt;/h3&gt;
&lt;p&gt;Today the firewall is not configured by &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; and it won’t be
configured by &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; as its &lt;cite&gt;–skip-firewalld&lt;/cite&gt; will be used. We
expect the default overcloud to not have firewall rules until
&lt;cite&gt;openstack overcloud deploy&lt;/cite&gt; introduces them. The THT parameters
described in the previous section will have the same firewall ports as
the ones they will deprecate (&lt;cite&gt;environments/ceph-ansible/*&lt;/cite&gt;) so that
the appropriate ports per service and based on composable roles will
be opened in the firewall as they are today.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="osd-devices"&gt;
&lt;h3&gt;OSD Devices&lt;/h3&gt;
&lt;p&gt;The current defaults will always be wrong for someone because the
&lt;cite&gt;devices&lt;/cite&gt; list of available disks will always vary based on hardware.
The new default will use all available devices when creating OSDs by
running &lt;cite&gt;ceph orch apply osd –all-available-devices&lt;/cite&gt;. It will still
be possible to override this default though the &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; syntax of
the &lt;cite&gt;devices&lt;/cite&gt; list will be deprecated. In its place the OSD Service
Specification defined by &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; drivegroups will be used and the tool
will apply it by running &lt;cite&gt;ceph orch apply osd -i osd_spec.yml&lt;/cite&gt;. More
information on the &lt;cite&gt;osd_spec.yaml&lt;/cite&gt; is covered in the Ceph End State
Definition YAML Input section.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="ceph-placement-group-parameters"&gt;
&lt;h3&gt;Ceph Placement Group Parameters&lt;/h3&gt;
&lt;p&gt;The new tool will deploy Ceph with the pg autotuner feature enabled.
Parameters to set the placement groups will be deprecated. Those who
wish to disable the pg autotuner may do so using Ceph CLI tools after
Ceph is deployed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="ceph-end-state-definition-yaml-input"&gt;
&lt;h3&gt;Ceph End State Definition YAML Input&lt;/h3&gt;
&lt;p&gt;Regardless of if Ceph is deployed &lt;em&gt;before&lt;/em&gt; or &lt;em&gt;during&lt;/em&gt; overcloud
deployment, a new playbook which deploys Ceph using &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; will be
created and it will accept the following files as input:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;deployed-metal.yaml: this file is generated by running a command
like &lt;cite&gt;openstack overcloud node provision … –output
deployed-metal.yaml&lt;/cite&gt; when using &lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html"&gt;metalsmith&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Optional) “deployed-network-env”: the file that is generated by
&lt;cite&gt;openstack network provision&lt;/cite&gt; as described in review &lt;a class="reference external" href="https://review.opendev.org/#/c/752437"&gt;752437&lt;/a&gt;. This
file is used when deploying Ceph before the overcloud to identify
the storage networks. This will not be necessary when deploying Ceph
during overcloud deployment so it is optional and the storage
network will be identified instead as it is today.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Optional) Any valid &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; config.yml spec file as described in
Ceph issue &lt;a class="reference external" href="https://tracker.ceph.com/issues/44205"&gt;44205&lt;/a&gt; may be directly passed to the &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; execution
and where applicable will override all relevant settings in the file
described at the end of this list.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Optional) Any valid &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/drivegroups"&gt;drivegroup&lt;/a&gt; YAML file (e.g. osd_spec.yml) may
be passed and the tooling will apply it with &lt;cite&gt;ceph orch apply osd -i
osd_spec.yml&lt;/cite&gt;. This setting will override all relevant settings in
the file described at the end of this list.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo_ceph_config.yaml: This file will contain configuration data
compatible with nearly all Ceph options supported today by TripleO
Heat Templates with the exception of the firewall, ceph pools and
cephx keys. A template of this file will be provided in as a default
in one of the new tripleo-ansible roles (e.g. tripleo_cephadm_common)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Another source of data which is input into the new playbook is the
inventory which is covered next section.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="ansible-inventory-and-ansible-user"&gt;
&lt;h3&gt;Ansible Inventory and Ansible User&lt;/h3&gt;
&lt;p&gt;The current Ceph implementation uses the Ansible user tripleo-admin.
That user and the corresponding SSH keys are created by the
tripleo-ansible role tripleo_create_admin. This role uses the
heat-admin account which is the default account if &lt;cite&gt;openstack
overcloud node provision&lt;/cite&gt; is not passed the &lt;cite&gt;–overcloud-ssh-user&lt;/cite&gt;
option. The current implementation also uses the inventory generated
by tripleo-ansible-inventory. These resources will not be available
if Ceph is deployed &lt;em&gt;before&lt;/em&gt; the overcloud and there’s no reason they
are needed if Ceph is deployed &lt;em&gt;during&lt;/em&gt; the overcloud deployment.&lt;/p&gt;
&lt;p&gt;Regardless if Ceph is deployed &lt;em&gt;before&lt;/em&gt; or &lt;em&gt;during&lt;/em&gt; overcloud, prior
to deploying Ceph, &lt;cite&gt;openstack overcloud admin authorize&lt;/cite&gt; should be run
and it should pass options to enable a ceph-admin user which can be
used by &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; and to allow SSH access for the ansible roles
described in this spec.&lt;/p&gt;
&lt;p&gt;A new command, &lt;cite&gt;openstack overcloud ceph inventory&lt;/cite&gt; will be
implemented which creates an Ansible inventory for the new playbook
and roles described in this spec. This command will require the
following input:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;deployed-metal.yaml: this file is generated by running a command
like &lt;cite&gt;openstack overcloud node provision … –output
deployed-metal.yaml&lt;/cite&gt; when using &lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html"&gt;metalsmith&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Optional) roles.yaml: If this file is not passed then
/usr/share/openstack-tripleo-heat-templates/roles_data.yaml will be
used in its place. If the roles in deployed-metal.yaml do not have a
definition found in roles.yaml, then an error is thrown that a role
being used is undefined. By using this file, the TripleO composable
roles will continue to work as they to today. The services matching
“OS::TripleO::Services::Ceph*” will correspond to a new Ansible
inventory group and the hosts in that group will correspond to the
hosts found in deployed-metal.yaml.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Options) &lt;cite&gt;-u –ssh-user &amp;lt;USER&amp;gt;&lt;/cite&gt;: this is not a file but an option
which defaults to “ceph-admin”. This represents the user which was
created created on all overcloud nodes by &lt;cite&gt;openstack overcloud admin
authorize&lt;/cite&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Options) &lt;cite&gt;-i –inventory &amp;lt;FILE&amp;gt;&lt;/cite&gt;: this is not a file but an option
which defaults to “/home/stack/inventory.yaml”. This represents the
inventory which will be created.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If Ceph is deployed before the overcloud, users will need to run this
command to generate an Ansible inventory file. They will also need to
pass the path to the generated inventory file to &lt;cite&gt;openstack overcloud
ceph provision&lt;/cite&gt; as input.&lt;/p&gt;
&lt;p&gt;If Ceph is deployed &lt;em&gt;during&lt;/em&gt; overcloud deployment, users do not need
to know about this command as external_deploy_steps_tasks will run
this command directly to generate the inventory before running the new
tripleo ceph playbook with this inventory.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="ceph-end-state-definition-yaml-output"&gt;
&lt;h3&gt;Ceph End State Definition YAML Output&lt;/h3&gt;
&lt;p&gt;The new playbook will write output data to one yaml file which
contains information about the Ceph cluster and may be used as
input to other processes.&lt;/p&gt;
&lt;p&gt;In the case that Ceph is deployed before the overcloud, if &lt;cite&gt;openstack
overcloud ceph provision –output ceph_data.yaml&lt;/cite&gt; were run, then
&lt;cite&gt;ceph_data.yaml&lt;/cite&gt; would then be passed to &lt;cite&gt;openstack overcloud deploy
… -e ceph_data.yaml&lt;/cite&gt;. The &lt;cite&gt;ceph_data.yaml&lt;/cite&gt; file will contain
key/value pairs such as the Ceph FSID, Name, and the Ceph monitor IPs.&lt;/p&gt;
&lt;p&gt;In the case that Ceph is deployed with the overcloud, if
external_deploy_steps_tasks calls the new playbook, then the same file
will be written to it’s default location (/home/stack/ceph_data.yaml)
and the new client role will directly read the parameters from this file.&lt;/p&gt;
&lt;p&gt;An example of what this file, e.g. &lt;cite&gt;ceph_data.yaml&lt;/cite&gt;, looks like is:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ceph&lt;/span&gt;
&lt;span class="n"&gt;fsid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;af25554b&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="n"&gt;f6&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;d2b&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="n"&gt;b9b&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d08a1132d3e899&lt;/span&gt;
&lt;span class="n"&gt;ceph_mon_ips&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;172.18.0.5&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;172.18.0.6&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;172.18.0.7&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In DCN deployments this type of information is extracted from the Heat
stack with &lt;cite&gt;overcloud export ceph&lt;/cite&gt;. When the new method of deployment
is used this information can come directly from the &lt;cite&gt;ceph_data.yaml&lt;/cite&gt;
file per Ceph cluster. This file will be passed as input to the new
ceph client role described in review &lt;a class="reference external" href="https://review.opendev.org/#/c/757644"&gt;757644&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="requirements-for-deploying-ceph-during-overcloud-deployment"&gt;
&lt;h3&gt;Requirements for deploying Ceph during Overcloud deployment&lt;/h3&gt;
&lt;p&gt;If Ceph is deployed &lt;em&gt;during&lt;/em&gt; the overcloud deployment, the following
should be the case:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The external_deploy_steps_tasks playbook will execute the new
Ansible roles after &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt; is executed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If &lt;cite&gt;openstack overcloud node  provision .. –output
deployed-metal.yaml&lt;/cite&gt; were run, then &lt;cite&gt;deployed-metal.yaml&lt;/cite&gt; would be
input to &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt;. This is the current behavior
we have in V.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Node scale up operations for day2 Ceph should be done by running
&lt;cite&gt;openstack overcloud node provision&lt;/cite&gt; and then &lt;cite&gt;openstack overcloud
deploy&lt;/cite&gt;. This will include reasserting the configuration of
OpenStack services unless those operations are specifically set to
“noop”.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Creates its own Ansible inventory and user&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The path to the “Ceph End State Definition YAML Input” is referenced
via a THT parameter so that when external_deploy_steps_tasks runs it
will pass this file to the new playbook.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="requirements-for-deploying-ceph-before-overcloud-deployment"&gt;
&lt;h3&gt;Requirements for deploying Ceph before Overcloud deployment&lt;/h3&gt;
&lt;p&gt;If Ceph is deployed &lt;em&gt;before&lt;/em&gt; the overcloud deployment, the following
should be the case:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The new Ansible roles will be triggered when the user runs a command
like &lt;cite&gt;openstack overcloud ceph …&lt;/cite&gt;; this command is meant
to be run after running &lt;cite&gt;openstack overcloud node provision&lt;/cite&gt; to
trigger &lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html"&gt;metalsmith&lt;/a&gt;  but before running &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If &lt;cite&gt;openstack overcloud node  provision .. –output
deployed-metal.yaml&lt;/cite&gt; were run, then &lt;cite&gt;deployed-metal.yaml&lt;/cite&gt; would be
input to &lt;cite&gt;openstack overcloud ceph provision&lt;/cite&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Node scale up operations for day2 Ceph should be done by running
&lt;cite&gt;openstack overcloud node provision&lt;/cite&gt;, &lt;cite&gt;openstack overcloud network
provision&lt;/cite&gt;, and &lt;cite&gt;openstack overcloud admin authorize&lt;/cite&gt; to enable a
ceph-admin user. However it isn’t necessary to run &lt;cite&gt;openstack
overcloud ceph …&lt;/cite&gt; because the operator should connect to the Ceph
cluster itself to add the extra resources, e.g. use a cephadm shell
to add the new hardware as OSDs or other Ceph resource. If the
operation includes adding hyperconverged node with both Ceph and
OpenStack services then the third step will be to run &lt;cite&gt;openstack
overcloud deploy&lt;/cite&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Requires the user to create an inventory (and user) before running
using new Ceph deployment tools.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Ceph End State Definition YAML Input” is directly passed.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="container-registry-support"&gt;
&lt;h3&gt;Container Registry Support&lt;/h3&gt;
&lt;p&gt;It is already supported to host a container registry on the
undercloud. This registry contains Ceph and OpenStack containers
and it may be populated before deployment or during deployment.
When deploying ceph before overcloud deployment it will need to be
populated before deployment. The new integration described in this
spec will direct &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; to pull the Ceph containers from the same
source identified by &lt;cite&gt;ContainerCephDaemonImage&lt;/cite&gt;. For example:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;ContainerCephDaemonImage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;undercloud&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ctlplane&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mydomain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tld&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8787&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ceph&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;daemon&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;v4&lt;/span&gt;&lt;span class="mf"&gt;.0.13&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;stable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;nautilus&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;centos&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x86_64&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="network-requirements-for-ceph-to-be-deployed-before-the-overcloud"&gt;
&lt;h3&gt;Network Requirements for Ceph to be deployed before the Overcloud&lt;/h3&gt;
&lt;p&gt;The deployment will be completed by running the following commands:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;openstack overcloud node provision …&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;openstack overcloud network provision …&lt;/cite&gt; (see review &lt;a class="reference external" href="https://review.opendev.org/#/c/751875"&gt;751875&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;openstack overcloud ceph …&lt;/cite&gt; (triggers cephadm/orchestrator)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;cite&gt;openstack overcloud deploy …&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the past stack updates did everything, but the split for
&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html"&gt;metalsmith&lt;/a&gt; established a new pattern. As per review &lt;a class="reference external" href="https://review.opendev.org/#/c/752437"&gt;752437&lt;/a&gt; and a
follow up spec to move port management out of Heat, and apply network
configuration prior to the Heat stack deployment, it will eventually
be possible for the network to be configured before &lt;cite&gt;openstack
overcloud deploy&lt;/cite&gt; is run. This creates an opening for the larger goal
of this spec which is a looser coupling between Ceph and OpenStack
deployment while retaining full integration. After the storage and
storage management networks are configured, then Ceph can be deployed
before any OpenStack services are configured. This should be possible
regardless of if the same node hosts both Ceph and OpenStack
containers.&lt;/p&gt;
&lt;p&gt;Development work on for deploying Ceph before overcloud deployment
can begin before the work described in reviews &lt;a class="reference external" href="https://review.opendev.org/#/c/752437"&gt;752437&lt;/a&gt; and &lt;a class="reference external" href="https://review.opendev.org/#/c/760536"&gt;760536&lt;/a&gt;
is completed by either of the following methods:&lt;/p&gt;
&lt;p&gt;Option 1:
- &lt;cite&gt;openstack overcloud deploy –skip-tags step2,step3,step4,step5&lt;/cite&gt;
- use tripleo-ceph development code to stand up Ceph
- &lt;cite&gt;openstack overcloud deploy –tags step2,step3,step4,step5&lt;/cite&gt;&lt;/p&gt;
&lt;p&gt;The last step will also configure the ceph clients. This sequence has
been verified to work in a proof of concept of this proposal.&lt;/p&gt;
&lt;p&gt;Option 2:
- Create the storage and storage management networks from the undercloud (using review &lt;a class="reference external" href="https://review.opendev.org/#/c/751875"&gt;751875&lt;/a&gt;)
- Create the Ironic ports for each node as per review &lt;a class="reference external" href="https://review.opendev.org/#/c/760536"&gt;760536&lt;/a&gt;
- Use instances Nics Properties to pass a list of dicts to provision the node not just on the ctlplane network but also the storage and storage-management networks when the node is provisioned with &lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html"&gt;metalsmith&lt;/a&gt;
- Metalsmith/Ironic should attach the VIFs so that the nodes are connected to the Storage and Storage Management networks so that Ceph can then be deployed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="pid1-services-used-by-ceph"&gt;
&lt;h3&gt;PID1 services used by Ceph&lt;/h3&gt;
&lt;p&gt;During the W cycle we will not be able to fully deploy an HA Dashboard
and HA RGW service before the overcloud is deployed. Thus, we will
deploy these services as we do today; by using a ceph tool, though
we’ll use &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; in place of &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt;, and then complete the
configuration of these services during overcloud deployment. Though
the work to deploy the service itself will be done before overcloud
deployment, the service won’t be accessible in HA until after the
overcloud deployment.&lt;/p&gt;
&lt;p&gt;Why can’t we fully deploy the HA RGW service before the overcloud?
Though &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; can deploy an HA RGW service without TripleO its
implementation uses keepalived which cannot be collocated with
pacemaker, which is required on controller nodes. Thus, during the
W cycle we will keep using the RGW service with haproxy and revisit
making it a separate deployment with collaboration with the PID1 team
in a future cycle.&lt;/p&gt;
&lt;p&gt;Why can’t we fully deploy the HA Dashboard service before the
overcloud? &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; does not currently have a builtin HA model for
its dashboard and the HA Dashboard is only available today when it
is deployed by TripleO (unless it’s configured manually).&lt;/p&gt;
&lt;p&gt;Ceph services which need VIPs (Dashbard and RGW) need to know what the
VIPs will be in advance but the VIPs do not need to be pingable before
those Ceph services are deployed. Instead we will be able to know what
the VIPs are before deploying Ceph per the work related to reviews
&lt;a class="reference external" href="https://review.opendev.org/#/c/751875"&gt;751875&lt;/a&gt; and &lt;a class="reference external" href="https://review.opendev.org/#/c/760536"&gt;760536&lt;/a&gt;. We will pass these VIPs as input to &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For example, if we know the Dashboard VIP in advance, we can run the
following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;ceph&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="n"&gt;dashboard&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;grafana&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;dashboard_protocol&lt;/span&gt; &lt;span class="p"&gt;}}:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;VIP&lt;/span&gt; &lt;span class="p"&gt;}}:{{&lt;/span&gt; &lt;span class="n"&gt;grafana_port&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The new automation could then save the VIP parameter in the ceph mgr
global config. A deployer could then and wait for haproxy to be
available from the overcloud deploy so that an HA dashbard similar to
the one Victoria deploys is available.&lt;/p&gt;
&lt;p&gt;It would be simpler if we could address the above issues before
overcloud deployment but doing so is out of the scope of this spec.
However, we can aim to offer the dashboard in HA with the new tooling
around the time of the X cycle and we hope to do so through
collaboration with the Ceph orchestrator community.&lt;/p&gt;
&lt;p&gt;TripleO today also supports deploying the Ceph dashboard on any
composed network. If the work included in review &lt;a class="reference external" href="https://review.opendev.org/#/c/760536"&gt;760536&lt;/a&gt; allows us to
compose and deploy the overcloud networks in advance, then we plan to
pass parameters to cephadm to continue support of the dashboard on its
own private network.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="tls-everywhere"&gt;
&lt;h3&gt;TLS-Everywhere&lt;/h3&gt;
&lt;p&gt;If Ceph is provisioned before the overcloud, then we will not have
the certificates and keys generated by certmonger via TripleO’s
tls-everywhere framework. We expect cephadm to be able to deploy the
Ceph Dashboard (with Grafana), RGW (with HA via haproxy) with TLS
enabled. For the sake of orthogonality we could require that the
certificates and keys for RGW and Dashboard be generated outside of
TripleO so that these services could be fully deployed without the
overcloud. However, because we still need to use PID1 services as
described in the previous section, we will continue to use TripleO’s
TLS-e framework.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;fmount&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;fultonj&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gfidente&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;jmolmo&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create a set of roles matching tripleo_ansible/roles/tripleo_cephadm_*
which can coexist with the current tripleo_ceph_common,
tripleo_ceph_uuid, tripleo_ceph_work_dir, tripleo_ceph_run_ansible,
roles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Patch the python tripleo client to support the new command options&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a new external_deploy_steps_tasks interface for deploying
Ceph using the new method during overcloud deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update THT scenario001/004 to use new method of ceph deployment&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-schedule"&gt;
&lt;h3&gt;Proposed Schedule&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;OpenStack W: merge tripleo-ansible/roles/ceph_client descrbed in
review &lt;a class="reference external" href="https://review.opendev.org/#/c/757644"&gt;757644&lt;/a&gt; early as it will work with &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; internal
ceph deployments too. Create tripleo-ansible/roles/cephadm_* roles
and tripleo client work to deploy Octopus as experimental and then
default (only if stable). If new tripleo-ceph is not yet stable,
then Wallaby will release with Nautilus support as deployed by
&lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; just like Victoria. Either way Nautilus support via
current THT and tripleo-ansible triggering &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; will be
deprecated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenStack X: tripleo-ansible/roles/cephadm_* become the default,
tripleo-ansible/roles/ceph_* are removed except the new ceph_client,
tripleo-heat-templates/environments/ceph-ansible/* removed. Migrate
to Ceph Pacific which GAs upstream in March 2021.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The spec for tripleo-ceph-client described in review &lt;a class="reference external" href="https://review.opendev.org/#/c/757644"&gt;757644&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The spec for network data v2 format described in review &lt;a class="reference external" href="https://review.opendev.org/#/c/752437"&gt;752437&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The spec for node ports described in review &lt;a class="reference external" href="https://review.opendev.org/#/c/760536"&gt;760536&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The last two items above are not required if we deploy Ceph during
overcloud deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;This project will be tested against at least two different scenarios.
This will ensure enough coverage on different use cases and cluster
configurations, which is pretty similar to the status of the job
definition currently present in the TripleO CI.
The defined scenarios will test different features that can be enabled
at day1.
As part of the implementation plan, the definition of the
tripleo-heat-templates environment CI files, which contain the testing job
parameters, is one of the goals of this project, and we should make sure
to have:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;a basic scenario that covers the ceph cluster deployment using &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt;;
we will gate the tripleo-ceph project against this scenario, as well
as the related tripleo heat templates deployment flow;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;a more advanced use case with the purpose of testing the configuration
that can be applied to the ceph cluster and are orchestrated by the
tripleo-ceph project.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The two items described above are pretty similar to the test suite that
today is maintained in the TripleO CI, and they can be implemented
reworking the existing scenarios, adding the proper support to the
&lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt; deployment model.
A WIP patch can be created and submitted with the purpose of testing
and gating the tripleo-ceph project, and, when it becomes stable
enough, the scenario001 will be able to be officially merged.
The same approach can be applied to the existing scenario004, which
can be seen as an improvement of the first testing job.
This is mostly used to test the Rados Gateway service deployment and
the manila pools and key configuration.
An important aspect of the job definition process is related to
standalone vs multinode.
As seen in the past, multinode can help catching issues that are not
visible in a standalone environment, but of course the job
configuration can be improved in the next cycles, and we can start
with standalone testing, which is what is present today in CI.
Maintaining the CI jobs green will be always one of the goals of the
ceph integration project, providing a smooth path and a good experience
moving from &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt; to &lt;a class="reference external" href="https://docs.ceph.com/en/latest/cephadm/"&gt;cephadm&lt;/a&gt;, continuously improving the testing
area to ensure enough coverage of the implemented features.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;tripleo-docs will be updated to cover Ceph integration with the new tool.&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Sat, 25 Apr 2020 00:00:00 </pubDate></item><item><title>Tuskar Plan REST API Specification</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/tripleo-juno-tuskar-rest-api.html</link><description>
 
&lt;p&gt;Blueprint:
&lt;a class="reference external" href="https://blueprints.launchpad.net/tuskar/+spec/tripleo-juno-tuskar-plan-rest-api"&gt;https://blueprints.launchpad.net/tuskar/+spec/tripleo-juno-tuskar-plan-rest-api&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In Juno, the Tuskar API is moving towards a model of being a large scale
application planning service. Its initial usage will be to deploy OpenStack
on OpenStack by leveraging TripleO Heat Templates and fitting into the
greater TripleO workflow.&lt;/p&gt;
&lt;p&gt;As compared to Icehouse, Tuskar will no longer make calls to Heat for creating
and updating a stack. Instead, it will serve to define and manipulate the Heat
templates for describing a cloud. Tuskar will be the source for the cloud
planning while Heat is the source for the state of the live cloud.&lt;/p&gt;
&lt;p&gt;Tuskar employs the following concepts:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Deployment Plan&lt;/em&gt; - The description of an application (for example,
the overcloud) being planned by Tuskar. The deployment plan keeps track of
what roles will be present in the deployment and their configuration values.
In TripleO terms, each overcloud will have its own deployment plan that
describes what services will run and the configuration of those services
for that particular overcloud. For brevity, this is simply referred to as
the “plan” elsewhere in this spec.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Role&lt;/em&gt; - A unit of functionality that can be added to a plan. A role
is the definition of what will run on a single server in the deployed Heat
stack. For example, an “all-in-one” role may contain all of the services
necessary to run an overcloud, while a “compute” role may provide only the
nova-compute service.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Put another way, Tuskar is responsible for assembling
the user-selected roles and their configuration into a Heat environment and
making the built Heat templates and files available to the caller (the
Tuskar UI in TripleO but, more generally, any consumer of the REST API) to send
to Heat.&lt;/p&gt;
&lt;p&gt;Tuskar will ship with the TripleO Heat Templates installed to serve as its
roles (dependent on the conversions taking place this release &lt;a class="footnote-reference brackets" href="#id7" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;).
For now it is assumed those templates are installed as part of the TripleO’s
installation of Tuskar. A different spec will cover the API calls necessary
for users to upload and manipulate their own custom roles.&lt;/p&gt;
&lt;p&gt;This specification describes the REST API clients will interact with in
Tuskar, including the URLs, HTTP methods, request, and response data, for the
following workflow:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create an empty plan in Tuskar.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;View the list of available roles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add roles to the plan.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Request, from Tuskar, the description of all of the configuration values
necessary for the entire plan.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Save user-entered configuration values with the plan in Tuskar.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Request, from Tuskar, the Heat templates for the plan, which includes
all of the files necessary to deploy the configured application in Heat.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The list roles call is essential to this workflow and is therefore described
in this specification. Otherwise, this specification does not cover the API
calls around creating, updating, or deleting roles. It is assumed that the
installation process for Tuskar in TripleO will take the necessary steps to
install the TripleO Heat Templates into Tuskar. A specification will be filed
in the future to cover the role-related API calls.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The REST API in Tuskar seeks to fulfill the following needs:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Flexible selection of an overcloud’s functionality and deployment strategy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Repository for discovering what roles can be added to a cloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Help the user to avoid having to manually manipulate Heat templates to
create the desired cloud setup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage of a cloud’s configuration without making the changes immediately
live (future needs in this area may include offering a more structured
review and promotion lifecycle for changes).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Overall Concepts&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;These API calls will be added under the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/v2/&lt;/span&gt;&lt;/code&gt; path, however the v1 API
will not be maintained (the model is being changed to not contact Heat and
the existing database is being removed &lt;a class="footnote-reference brackets" href="#id6" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All calls have the potential to raise a 500 if something goes horribly wrong
in the server, but for brevity this is omitted from the list of possible
response codes in each call.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All calls have the potential to raise a 401 in the event of a failed user
authentication and have been similarly omitted from each call’s
documentation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr class="docutils"/&gt;
&lt;p id="retrieve-single-plan"&gt;&lt;strong&gt;Retrieve a Single Plan&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;URL: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/plans/&amp;lt;plan-uuid&amp;gt;/&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Method: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;GET&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Description: Returns the details of a specific plan, including its
list of assigned roles and configuration information.&lt;/p&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The configuration values are read from Tuskar’s stored files rather than
Heat itself. Heat is the source for the live stack, while Tuskar is the
source for the plan.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Request Data: None&lt;/p&gt;
&lt;p&gt;Response Codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;200 - if the plan is found&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;404 - if there is no plan with the given UUID&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Data:&lt;/p&gt;
&lt;p&gt;JSON document containing the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Tuskar UUID for the given plan.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Name of the plan that was created.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Description of the plan that was created.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The timestamp of the last time a change was made.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;List of the roles (identified by name and version) assigned to the plan.
For this sprint, there will be no pre-fetching of any more role information
beyond name and version, but can be added in the future while maintaining
backward compatibility.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;List of parameters that can be configured for the plan, including the
parameter name, label, description, hidden flag, and current value if
set.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Example:&lt;/p&gt;
&lt;div class="highlight-json notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"uuid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dd4ef003-c855-40ba-b5a6-3fe4176a069e"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dev-cloud"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"description"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Development testing cloud"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"last_modified"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2014-05-28T21:11:09Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"roles"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"uuid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"55713e6a-79f5-42e1-aa32-f871b3a0cb64"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"compute"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"version"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"links"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;"href"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://server/v2/roles/55713e6a-79f5-42e1-aa32-f871b3a0cb64/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;"rel"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bookmark"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"uuid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2ca53130-b9a4-4fa5-86b8-0177e8507803"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"controller"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"version"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;"links"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;"href"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://server/v2/roles/2ca53130-b9a4-4fa5-86b8-0177e8507803/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;"rel"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bookmark"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"parameters"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"database_host"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"label"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Database Host"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"description"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hostname of the database server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"hidden"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"false"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"value"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10.11.12.13"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"links"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="nt"&gt;"href"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://server/v2/plans/dd4ef003-c855-40ba-b5a6-3fe4176a069e/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="nt"&gt;"rel"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"self"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;hr class="docutils"/&gt;
&lt;p id="retrieve-plan-template"&gt;&lt;strong&gt;Retrieve a Plan’s Template Files&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;URL: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/plans/&amp;lt;plan-uuid&amp;gt;/templates/&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Method: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;GET&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Description: Returns the set of files to send to Heat to create or update
the planned application.&lt;/p&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The Tuskar service will build up the entire environment into a single
file suitable for sending to Heat. The contents of this file are returned
from this call.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Request Data: None&lt;/p&gt;
&lt;p&gt;Response Codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;200 - if the plan’s templates are found&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;404 - if no plan exists with the given ID&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Data: &amp;lt;Heat template&amp;gt;&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p id="list-plans"&gt;&lt;strong&gt;List Plans&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;URL: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/plans/&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Method: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;GET&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Description: Returns a list of all plans stored in Tuskar. In the future when
multi-tenancy is added, this will be scoped to a particular tenant.&lt;/p&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The detailed information about a plan, including its roles and configuration
values, are not returned in this call. A follow up call is needed on the
specific plan. It may be necessary in the future to add a flag to pre-fetch
this information during this call.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Request Data: None (future enhancement will require the tenant ID and
potentially support a pre-fetch flag for more detailed data)&lt;/p&gt;
&lt;p&gt;Response Codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;200 - if the list can be retrieved, even if the list is empty&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Data:&lt;/p&gt;
&lt;p&gt;JSON document containing a list of limited information about each plan.
An empty list is returned when no plans are present.&lt;/p&gt;
&lt;p&gt;Response Example:&lt;/p&gt;
&lt;div class="highlight-json notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"uuid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3e61b4b2-259b-4b91-8344-49d7d6d292b6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dev-cloud"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"description"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Development testing cloud"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"links"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="nt"&gt;"href"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://server/v2/plans/3e61b4b2-259b-4b91-8344-49d7d6d292b6/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="nt"&gt;"rel"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bookmark"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"uuid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"135c7391-6c64-4f66-8fba-aa634a86a941"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qe-cloud"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"description"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"QE testing cloud"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nt"&gt;"links"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="nt"&gt;"href"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://server/v2/plans/135c7391-6c64-4f66-8fba-aa634a86a941/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="nt"&gt;"rel"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bookmark"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;hr class="docutils"/&gt;
&lt;p id="create-new-plan"&gt;&lt;strong&gt;Create a New Plan&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;URL: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/plans/&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Method: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;POST&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Description: Creates an entry in Tuskar’s storage for the plan. The details
are outside of the scope of this spec, but the idea is that all of the
necessary Heat environment infrastructure files and directories will be
created and stored in Tuskar’s storage solution &lt;a class="footnote-reference brackets" href="#id6" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Unlike in Icehouse, Tuskar will not make any calls into Heat during this
call. This call is to create a new (empty) plan in Tuskar that
can be manipulated, configured, saved, and retrieved in a format suitable
for sending to Heat.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This is a synchronous call that completes when Tuskar has created the
necessary files for the newly created plan.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As of this time, this call does not support a larger batch operation that
will add roles or set configuration values in a single call. From a REST
perspective, this is acceptable, but from a usability standpoint we may want
to add this support in the future.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Request Data:&lt;/p&gt;
&lt;p&gt;JSON document containing the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Name - Name of the plan being created. Must be unique across all plans
in the same tenant.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Description - Description of the plan to create.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Request Example:&lt;/p&gt;
&lt;div class="highlight-json notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dev-cloud"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"description"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Development testing cloud"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Response Codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;201 - if the create is successful&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;409 - if there is an existing plan with the given name (for a particular
tenant when multi-tenancy is taken into account)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Data:&lt;/p&gt;
&lt;p&gt;JSON document describing the created plan.
The details are the same as for the GET operation on an individual plan
(see &lt;a class="reference internal" href="#retrieve-single-plan"&gt;&lt;span class="std std-ref"&gt;Retrieve a Single Plan&lt;/span&gt;&lt;/a&gt;).&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p id="delete-plan"&gt;&lt;strong&gt;Delete an Existing Plan&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;URL: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/plans/&amp;lt;plan-uuid&amp;gt;/&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Method: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;DELETE&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Description: Deletes the plan’s Heat templates and configuration values from
Tuskar’s storage.&lt;/p&gt;
&lt;p&gt;Request Data: None&lt;/p&gt;
&lt;p&gt;Response Codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;200 - if deleting the plan entries from Tuskar’s storage was successful&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;404 - if there is no plan with the given UUID&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Data: None&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p id="add-plan-role"&gt;&lt;strong&gt;Adding a Role to a Plan&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;URL: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/plans/&amp;lt;plan-uuid&amp;gt;/roles/&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Method: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;POST&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Description: Adds the specified role to the given plan.&lt;/p&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This will cause the parameter consolidation to occur and entries added to
the plan’s configuration parameters for the new role.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This call will update the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;last_modified&lt;/span&gt;&lt;/code&gt; timestamp to indicate a change
has been made that will require an update to Heat to be made live.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Request Data:&lt;/p&gt;
&lt;p&gt;JSON document containing the uuid of the role to add.&lt;/p&gt;
&lt;p&gt;Request Example:&lt;/p&gt;
&lt;div class="highlight-json notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;"uuid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"role_uuid"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Response Codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;201 - if the addition is successful&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;404 - if there is no plan with the given UUID&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;409 - if the plan already has the specified role&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Data:&lt;/p&gt;
&lt;p&gt;The same document describing the plan as from
&lt;a class="reference internal" href="#retrieve-single-plan"&gt;&lt;span class="std std-ref"&gt;Retrieve a Single Plan&lt;/span&gt;&lt;/a&gt;. The newly added
configuration parameters will be present in the result.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p id="remove-cloud-plan"&gt;&lt;strong&gt;Removing a Role from a Plan&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;URL: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/plans/&amp;lt;plan-uuid&amp;gt;/roles/&amp;lt;role-uuid&amp;gt;/&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Method: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;DELETE&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Description: Removes a role identified by role_uuid from the given plan.&lt;/p&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This will cause the parameter consolidation to occur and entries to be
removed from the plan’s configuration parameters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This call will update the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;last_modified&lt;/span&gt;&lt;/code&gt; timestamp to indicate a change
has been made that will require an update to Heat to be made live.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Request Data: None&lt;/p&gt;
&lt;p&gt;Response Codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;200 - if the removal is successful&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;404 - if there is no plan with the given UUID or it does not have the
specified role and version combination&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Data:&lt;/p&gt;
&lt;p&gt;The same document describing the cloud as from
&lt;a class="reference internal" href="#retrieve-single-plan"&gt;&lt;span class="std std-ref"&gt;Retrieve a Single Plan&lt;/span&gt;&lt;/a&gt;. The configuration
parameters will be updated to reflect the removed role.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p id="changing-plan-configuration"&gt;&lt;strong&gt;Changing a Plan’s Configuration Values&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;URL: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/plans/&amp;lt;plan-uuid&amp;gt;/&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Method: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;PATCH&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Description: Sets the values for one or more configuration parameters.&lt;/p&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This call will update the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;last_modified&lt;/span&gt;&lt;/code&gt; timestamp to indicate a change
has been made that will require an update to Heat to be made live.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Request Data: JSON document containing the parameter keys and values to set
for the plan.&lt;/p&gt;
&lt;p&gt;Request Example:&lt;/p&gt;
&lt;div class="highlight-json notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"database_host"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"value"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10.11.12.13"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"database_password"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"value"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"secret"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Response Codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;200 - if the update was successful&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;400 - if one or more of the new values fails validation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;404 - if there is no plan with the given UUID&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Data:&lt;/p&gt;
&lt;p&gt;The same document describing the plan as from
&lt;a class="reference internal" href="#retrieve-single-plan"&gt;&lt;span class="std std-ref"&gt;Retrieve a Single Plan&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p id="list-roles"&gt;&lt;strong&gt;Retrieving Possible Roles&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;URL: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/roles/&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Method: &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;GET&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Description: Returns a list of all roles available in Tuskar.&lt;/p&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;There will be a separate entry for each version of a particular role.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Request Data: None&lt;/p&gt;
&lt;p&gt;Response Codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;200 - containing the available roles&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Data: A list of roles, where each role contains:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Name&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Version&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Response Example:&lt;/p&gt;
&lt;div class="highlight-json notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"uuid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3d46e510-6a63-4ed1-abd0-9306a451f8b4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"compute"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"version"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"description"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Nova Compute"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"uuid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"71d6c754-c89c-4293-9d7b-c4dcc57229f0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"compute"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"version"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"description"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Nova Compute"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"uuid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"651c26f6-63e2-4e76-9b60-614b51249677"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"name"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"controller"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"version"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;"description"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Controller Services"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;There are currently no alternate schemas proposed for the REST APIs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;These changes should have no additional security impact.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The potential performance issues revolve around Tuskar’s solution for storing
the cloud files &lt;a class="footnote-reference brackets" href="#id6" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;After being merged, there will be a period where the Tuskar CLI is out of date
with the new calls. The Tuskar UI will also need to be updated for the changes
in flow.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jdob&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Implement plan CRUD APIs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement role retrieval API&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write REST API documentation&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;These API changes are dependent on the rest of the Tuskar backend being
implemented, including the changes to storage and the template consolidation.&lt;/p&gt;
&lt;p&gt;Additionally, the assembly of roles (provider resources) into a Heat
environment is contingent on the conversion of the TripleO Heat templates &lt;a class="footnote-reference brackets" href="#id7" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Tempest testing should be added as part of the API creation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The REST API documentation will need to be updated accordingly.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id6" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id2"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id3"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id4"&gt;3&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/97553/"&gt;https://review.openstack.org/#/c/97553/&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id7" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id5"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/97939/"&gt;https://review.openstack.org/#/c/97939/&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Fri, 10 Apr 2020 00:00:00 </pubDate></item><item><title>Remove merge.py from TripleO Heat Templates</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/kilo/remove-mergepy.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-remove-mergepy"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-remove-mergepy&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; is where we’ve historically accumulated the technical debt for our
Heat templates &lt;a class="footnote-reference brackets" href="#id14" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;0&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; with the intention of migrating away from it when Heat meets
our templating needs.&lt;/p&gt;
&lt;p&gt;Its main functionality includes combining smaller template snippets into a
single template describing the full TripleO deployment, merging certain
resources together to reduce duplication while keeping the snippets themselves
functional as standalone templates and a support for manual scaling of Heat
resources.&lt;/p&gt;
&lt;p&gt;This spec describes the changes necessary to move towards templates
that do not depend on &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt;. We will use native Heat features
where we can and document the rest, possibly driving new additions to
the Heat template format.&lt;/p&gt;
&lt;p&gt;It is largely based on the April 2014 discussion in openstack-dev &lt;a class="footnote-reference brackets" href="#id15" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Because of the mostly undocumented nature of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; our templates are
difficult to understand or modify by newcomers (even those already familiar with
Heat).&lt;/p&gt;
&lt;p&gt;It has always been considered a short-term measure and Heat can now provide most
of what we need in our templates.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;We will start with making small correctness-preserving changes to our
templates and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; that move us onto using more Heat native
features. Where we cannot make the change for some reason, we will
file a bug with Heat and work with them to unblock the process.&lt;/p&gt;
&lt;p&gt;Once we get to a point where we have to do large changes to the
structure of our templates, we will split them off to new files and
enable them in our CI as parallel implementations.&lt;/p&gt;
&lt;p&gt;Once we are confident that the new templates fulfill the same
requirements as the original ones, we will deprecate the old ones,
deprecate &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; and switch to the new ones as the default.&lt;/p&gt;
&lt;p&gt;The list of action items necessary for the full transition is
below.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Remove the custom resource types&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;TripleO Heat templates and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; carry two custom types that (after the
move to software config &lt;a class="footnote-reference brackets" href="#id20" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, &lt;a class="footnote-reference brackets" href="#id21" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;) are no longer used for anything:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;OpenStack::ImageBuilder::Elements&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenStack::Role&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We will drop them from the templates and deprecate in the merge tool.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Remove combining whitelisted resource types&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If we have two &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;AWS::AutoScaling::LaunchConfiguration&lt;/span&gt;&lt;/code&gt; resources with the same
name, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; will combine their &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Properties&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Metadata&lt;/span&gt;&lt;/code&gt;. Our
templates are no longer using this after the software-config update.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Port TripleO Heat templates to HOT&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With most of the non-Heat syntax out of the way, porting our CFN/YAML templates
to pure HOT format &lt;a class="footnote-reference brackets" href="#id16" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; should be straightforward.&lt;/p&gt;
&lt;p&gt;We will have to update &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; as well. We should be able to support both
the old format and HOT.&lt;/p&gt;
&lt;p&gt;We should be able to differentiate between the two by looking for the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;heat_template_version&lt;/span&gt;&lt;/code&gt; top-level section which is mandatory in the HOT
syntax.&lt;/p&gt;
&lt;p&gt;Most of the changes to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; should be around spelling (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Parameters&lt;/span&gt;&lt;/code&gt; -&amp;gt;
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;parameters&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Resources&lt;/span&gt;&lt;/code&gt; -&amp;gt; &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;resources&lt;/span&gt;&lt;/code&gt;) and different names for
intrinsic functions, etc. (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Fn::GetAtt&lt;/span&gt;&lt;/code&gt; -&amp;gt; &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;get_attr&lt;/span&gt;&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;This task will require syntactic changes to all of our templates and
unfortunately, it isn’t something different people can update bit by bit. We
should be able to update the undercloud and overcloud portions separately, but
we can’t e.g. just update a part of the overcloud. We are still putting
templates together with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; at this point and we would end up with a
template that has both CFN and HOT bits.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Move to Provider resources&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Heat allows passing-in multiple templates when deploying a stack. These
templates can map to custom resource types. Each template would represent a role
(compute server, controller, block storage, etc.) and its &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;parameters&lt;/span&gt;&lt;/code&gt; and
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;outputs&lt;/span&gt;&lt;/code&gt; would map to the custom resource’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;properties&lt;/span&gt;&lt;/code&gt; and
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;attributes&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;These roles will be referenced from a master template (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud.yaml&lt;/span&gt;&lt;/code&gt;,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;undercloud.yaml&lt;/span&gt;&lt;/code&gt;) and eventually wrapped in a scaling resource
(&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::Heat::ResourceGroup&lt;/span&gt;&lt;/code&gt; &lt;a class="footnote-reference brackets" href="#id19" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;) or whatever scaling mechanism we adopt.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Provider resources represent fully functional standalone templates.
Any provider resource template can be passed to Heat and turned into a
stack or treated as a custom resource in a larger deployment.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Here’s a hypothetical outline of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;compute.yaml&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;flavor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
  &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
  &lt;span class="n"&gt;amqp_host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
  &lt;span class="n"&gt;nova_compute_driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;

&lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;compute_instance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Nova&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;
    &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;flavor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;flavor&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;compute_deployment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Heat&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;StructuredDeployment&lt;/span&gt;
    &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;compute_instance&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;compute_config&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;input_values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;amqp_host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;amqp_host&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;nova_compute_driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;nova_compute_driver&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;compute_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Heat&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;StructuredConfig&lt;/span&gt;
      &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;apply&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="n"&gt;amqp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;amqp_host&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;compute_driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;nova_compute_driver&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We will use a similar structure for all the other roles (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;controller.yaml&lt;/span&gt;&lt;/code&gt;,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;block-storage.yaml&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;swift-storage.yaml&lt;/span&gt;&lt;/code&gt;, etc.). That is, each role will
contain the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::Nova::Server&lt;/span&gt;&lt;/code&gt;, the associated deployments and any other
resources required (random string generators, security groups, ports, floating
IPs, etc.).&lt;/p&gt;
&lt;p&gt;We can map the roles to custom types using Heat environments &lt;a class="footnote-reference brackets" href="#id18" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;role_map.yaml&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;resource_registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TripleO&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Compute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
  &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TripleO&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Controller&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
  &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TripleO&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BlockStorage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
  &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TripleO&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;SwiftStorage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;swift&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Lastly, we’ll have a master template that puts it all together.&lt;/p&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud.yaml&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;compute_flavor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
  &lt;span class="n"&gt;compute_image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
  &lt;span class="n"&gt;compute_amqp_host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
  &lt;span class="n"&gt;compute_driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;compute0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# defined in controller.yaml, type mapping in role_map.yaml&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TripleO&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Compute&lt;/span&gt;
    &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;flavor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;compute_flavor&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;compute_image&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;amqp_host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;compute_amqp_host&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;nova_compute_driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;compute_driver&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;controller0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# defined in controller.yaml, type mapping in role_map.yaml&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TripleO&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Controller&lt;/span&gt;
    &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;flavor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;controller_flavor&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;controller_image&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="o"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;keystone_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;URL&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;Overcloud&lt;/span&gt; &lt;span class="n"&gt;Keystone&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;
    &lt;span class="c1"&gt;# `keystone_url` is an output defined in the `controller.yaml` template.&lt;/span&gt;
    &lt;span class="c1"&gt;# We're referencing it here to expose it to the Heat user.&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;get_attr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;controller_0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keystone_url&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;and similarly for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;undercloud.yaml&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;The individual roles (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;compute.yaml&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;controller.yaml&lt;/span&gt;&lt;/code&gt;) are
structured in such a way that they can be launched as standalone
stacks (i.e. in order to test the compute instance, one can type
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;heat&lt;/span&gt; &lt;span class="pre"&gt;stack-create&lt;/span&gt; &lt;span class="pre"&gt;-f&lt;/span&gt; &lt;span class="pre"&gt;compute.yaml&lt;/span&gt; &lt;span class="pre"&gt;-P&lt;/span&gt; &lt;span class="pre"&gt;...&lt;/span&gt;&lt;/code&gt;). Indeed, Heat treats
provider resources as nested stacks internally.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;5. Remove FileInclude from ``merge.py``&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The goal of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;FileInclude&lt;/span&gt;&lt;/code&gt; was to keep individual Roles (to borrow a
loaded term from TripleO UI) viable as templates that can be launched
standalone. The canonical example is &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nova-compute-instance.yaml&lt;/span&gt;&lt;/code&gt; &lt;a class="footnote-reference brackets" href="#id17" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With the migration to provider resources, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;FileInclude&lt;/span&gt;&lt;/code&gt; is not necessary.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. Move the templates to Heat-native scaling&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Scaling of resources is currently handled by &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt;. The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--scale&lt;/span&gt;&lt;/code&gt;
command line argument takes a resource name and duplicates it as needed (it’s
a bit more complicated than that, but that’s beside the point).&lt;/p&gt;
&lt;p&gt;Heat has a native scaling &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::Heat::ResourceGroup&lt;/span&gt;&lt;/code&gt; &lt;a class="footnote-reference brackets" href="#id19" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; resource that does
essentially the same thing:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;scaled_compute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Heat&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ResourceGroup&lt;/span&gt;
  &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;
    &lt;span class="n"&gt;resource_def&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TripleO&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Compute&lt;/span&gt;
      &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;flavor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;baremetal&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rhel7&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This will create 42 instances of compute hosts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;7. Replace Merge::Map with scaling groups’ inner attributes&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We are using the custom &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Merge::Map&lt;/span&gt;&lt;/code&gt; helper function for getting values out of
scaled-out servers:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L642"&gt;Building a comma-separated list of RabbitMQ nodes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L405"&gt;Getting the name of the first controller node&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L405"&gt;List of IP addresses of all controllers&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L585"&gt;Building the /etc/hosts file&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ResourceGroup&lt;/span&gt;&lt;/code&gt; resource supports selecting an attribute of an inner
resource as well as getting the same attribute from all resources and returning
them as a list.&lt;/p&gt;
&lt;p&gt;Example of getting an IP address of the controller node:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_attr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;controller_group&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="mf"&gt;.0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctlplane&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;(&lt;cite&gt;controller_group&lt;/cite&gt; is the &lt;cite&gt;ResourceGroup&lt;/cite&gt; of our controller nodes, &lt;cite&gt;ctlplane&lt;/cite&gt;
is the name of our control plane network)&lt;/p&gt;
&lt;p&gt;Example of getting the list of names of all of the controller nodes:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_attr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;controller_group&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The more complex uses of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Merge::Map&lt;/span&gt;&lt;/code&gt; involve formatting the returned data in
some way, for example building a list of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;{ip:&lt;/span&gt; &lt;span class="pre"&gt;...,&lt;/span&gt; &lt;span class="pre"&gt;name:&lt;/span&gt; &lt;span class="pre"&gt;...}&lt;/span&gt;&lt;/code&gt; dictionaries
for haproxy or generating the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/etc/hosts&lt;/span&gt;&lt;/code&gt; file.&lt;/p&gt;
&lt;p&gt;Since our ResourceGroups will not be using Nova servers directly, but rather the
custom role types using provider resources and environments, we can put this
data formatting into the role’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;outputs&lt;/span&gt;&lt;/code&gt; section and then use the same
mechanism as above.&lt;/p&gt;
&lt;p&gt;Example of building out the haproxy node entries:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="c1"&gt;# overcloud.yaml:&lt;/span&gt;
&lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;controller_group&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Heat&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ResourceGroup&lt;/span&gt;
    &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;controller_scale&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="n"&gt;resource_def&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TripleO&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Controller&lt;/span&gt;
        &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="o"&gt;...&lt;/span&gt;

  &lt;span class="n"&gt;controllerConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Heat&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;StructuredConfig&lt;/span&gt;
    &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="o"&gt;...&lt;/span&gt;
      &lt;span class="n"&gt;haproxy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_attr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;controller_group&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;haproxy_node_entry&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;



&lt;span class="c1"&gt;# controller.yaml:&lt;/span&gt;
&lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Nova&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;
    &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="o"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;haproxy_node_entry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;dictionary&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;configuring&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt;
      &lt;span class="n"&gt;haproxy&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_attr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctlplane&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
      &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;get_attr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;This proposal is very t-h-t and Heat specific. One alternative is to do nothing
and keep using and evolving &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt;. That was never the intent, and most
members of the core team do not consider this a viable long-term option.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;This proposal does not affect the overall functionality of TripleO in any way.
It just changes the way TripleO Heat templates are stored and written.&lt;/p&gt;
&lt;p&gt;If anything, this will move us towards more standard and thus more easily
auditable templates.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;There should be no impact for the users of vanilla TripleO.&lt;/p&gt;
&lt;p&gt;More advanced users may want to customise the existing Heat templates or write
their own. That will be made easier when we rely on standard Heat features only.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;This moves some of the template-assembling burden from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; to Heat. It
will likely also end up producing more resources and nested stacks on the
background.&lt;/p&gt;
&lt;p&gt;As far as we’re aware, no one has tested these features at the scale we are
inevitably going to hit.&lt;/p&gt;
&lt;p&gt;Before we land changes that can affect this (provider config and scaling) we
need to have scale tests in Tempest running TripleO to make sure Heat can cope.&lt;/p&gt;
&lt;p&gt;These tests can be modeled after the &lt;a class="reference external" href="https://github.com/openstack/tempest/blob/master/tempest/scenario/test_large_ops.py"&gt;large_ops&lt;/a&gt; scenario: a Heat template that
creates and destroys a stack of 50 Nova server resources with associated
software configs.&lt;/p&gt;
&lt;p&gt;We should have two tests to asses the before and after performance:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;A single HOT template with 50 copies of the same server resource and software
config/deployment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A template with a single server and its software config/deploys, an
environment file with a custom type mapping and an overall template that
wraps the new type in a ResourceGroup with the count of 50.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Deployers can keep using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; and the existing Heat templates as before
– existing scripts ought not break.&lt;/p&gt;
&lt;p&gt;With the new templates, Heat will be called directly and will need the resource
registry (in a Heat environment file). This will mean a change in the deployment
process.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;This should not affect non-Heat and non-TripleO OpenStack developers.&lt;/p&gt;
&lt;p&gt;There will likely be a slight learning curve for the TripleO developers who want
to write and understand our Heat templates. Chances are, we will also encounter
bugs or unforeseen complications while swapping &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt; for Heat features.&lt;/p&gt;
&lt;p&gt;The impact on Heat developers would involve processing the bugs and feature
requests we uncover. This will hopefully not be an avalanche.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Tomas Sedovic &amp;lt;lp: tsedovic&amp;gt; &amp;lt;irc: shadower&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Remove the custom resource types&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove combining whitelisted resource types&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Port TripleO Heat templates to HOT&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move to Provider resources&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove FileInclude from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;merge.py&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move the templates to Heat-native scaling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Replace Merge::Map with scaling groups’ inner attributes&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The Juno release of Heat&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Being able to kill specific nodes in Heat (for scaling down or because they’re
misbehaving)
- Relevant Heat blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/heat/+spec/autoscaling-parameters"&gt;autoscaling-parameters&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;All of these changes will be made to the tripleo-heat-templates repository and
should be testable by our CI just as any other t-h-t change.&lt;/p&gt;
&lt;p&gt;In addition, we will need to add Tempest scenarios for scale to ensure Heat can
handle the load.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will need to update the &lt;a class="reference external" href="http://docs.openstack.org/developer/tripleo-incubator/devtest.html"&gt;devtest&lt;/a&gt;, &lt;a class="reference external" href="http://docs.openstack.org/developer/tripleo-incubator/deploying.html"&gt;Deploying TripleO&lt;/a&gt; and &lt;a class="reference external" href="http://docs.openstack.org/developer/tripleo-incubator/userguide.html"&gt;Using TripleO&lt;/a&gt;
documentation and create a guide for writing TripleO templates.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id14" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;0&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates"&gt;https://github.com/openstack/tripleo-heat-templates&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id15" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2014-April/031915.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2014-April/031915.html&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id16" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://docs.openstack.org/developer/heat/template_guide/hot_guide.html"&gt;http://docs.openstack.org/developer/heat/template_guide/hot_guide.html&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id17" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id8"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/master/nova-compute-instance.yaml"&gt;https://github.com/openstack/tripleo-heat-templates/blob/master/nova-compute-instance.yaml&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id18" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://docs.openstack.org/developer/heat/template_guide/environment.html"&gt;http://docs.openstack.org/developer/heat/template_guide/environment.html&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id19" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id6"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id9"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::ResourceGroup"&gt;http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::ResourceGroup&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id20" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;8&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/81666/"&gt;https://review.openstack.org/#/c/81666/&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id21" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id4"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/93319/"&gt;https://review.openstack.org/#/c/93319/&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Fri, 10 Apr 2020 00:00:00 </pubDate></item><item><title>Enable TripleO to Deploy Ceph via Ceph Ansible</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/tripleo-ceph-ansible-integration.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph-ansible"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph-ansible&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Enable TripleO to deploy Ceph via Ceph Ansible using a new Mistral
workflow. This will make the Ceph installation less tightly coupled
with TripleO but the existing operator interfaces to deploy Ceph with
TripleO will still be supported until the end of the Queens release.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The Ceph community maintains ceph-ansible to deploy and manage Ceph.
Members of the TripleO community maintain similar tools too. This is
a proposal to have TripleO trigger the Ceph community’s tools via
Mistral as an alternative method to deploy and manage Ceph.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="benefits-of-using-another-project-to-deploy-and-manage-ceph"&gt;
&lt;h2&gt;Benefits of using another project to deploy and manage Ceph&lt;/h2&gt;
&lt;section id="avoid-duplication-of-effort"&gt;
&lt;h3&gt;Avoid duplication of effort&lt;/h3&gt;
&lt;p&gt;If there is a feature or bug fix in the Ceph community’s tools not in
the tools used by TripleO, then members of the TripleO community could
allow deployers to use those features directly instead of writing
their own implementation. If this proposal is successful, then it
might result in not maintaining two code bases, (along with the bug
fixes and testing included) in the future. For example, if
ceph-ansible fixed a bug to correctly handle alternative system paths
to block devices, e.g. /dev/disk/by-path/ in lieu of /dev/sdb, then
the same bug would not need to be fixed in puppet-ceph. This detail
would also be nicely abstracted from a deployer because this spec
proposes maintaining parity with TripleO Heat Templates. Thus, the
deployer would not need to change the &lt;cite&gt;ceph::profile::params::osds&lt;/cite&gt;
parameter as the same list of OSDs would work.&lt;/p&gt;
&lt;p&gt;In taking this approach it’s possible for there to be cases where
TripleO’s deployment architecture may have unique features that don’t
exist within ceph-ansible. In these cases, efforts may need to be
taken so ensure such a features remian in parity with this approach.
In no way, does this proposal enable a TripleO deployer to bypass
TripleO and use ceph-ansible directly. Also, because Ceph is not an
OpenStack service itself but a service that TripleO uses, this
approach remains consistent with the TripleO mission.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="consistency-between-openstack-and-non-openstack-ceph-deployments"&gt;
&lt;h3&gt;Consistency between OpenStack and non-OpenStack Ceph deployments&lt;/h3&gt;
&lt;p&gt;A deployer may seek assistance from the Ceph community with a Ceph
deployment and this process will be simplified if both deployments
were done using the same tool.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="enable-decoupling-of-ceph-management-from-tripleo"&gt;
&lt;h3&gt;Enable Decoupling of Ceph management from TripleO&lt;/h3&gt;
&lt;p&gt;The complexity of Ceph management can be moved to a different tool
and abstracted, where appropriate, from TripleO making the Ceph
management aspect of TripleO less complex. Combining this with
containerized Ceph would offer flexible deployment options. This
is a deployer benefit that is difficult to deliver today.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="features-in-the-ceph-community-s-tools-not-in-tripleo-s-tools"&gt;
&lt;h3&gt;Features in the Ceph community’s tools not in TripleO’s tools&lt;/h3&gt;
&lt;p&gt;The Ceph community tool, ceph-ansible &lt;a class="footnote-reference brackets" href="#id19" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, offers benefits to
OpenStack users not found in TripleO’s tool chain, including playbooks
to deploy Ceph in containers and migrate a non-containerized
deployment to a containerized deployment without downtime. Also,
making the Ceph deployment in TripleO less tightly coupled, by moving
it into a new Mistral workflow, would make it easier in a future
release to add a business logic layer through a tool like Tendrl &lt;a class="footnote-reference brackets" href="#id20" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;,
to offer additional Ceph policy based configurations and possibly a
graphical tool to see the status of the Ceph cluster. However, the
scope of this proposal for Pike does not include Tendrl and instead
takes the first step towards deploying Ceph via a Mistral workflow by
triggering ceph-ansible directly. After the Pike cycle is complete
triggering Mistral may be considered in a future spec.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The ceph-ansible &lt;a class="footnote-reference brackets" href="#id19" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; project provides a set of playbooks to deploy
and manage Ceph. A proof of concept &lt;a class="footnote-reference brackets" href="#id21" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; has been written which uses
two custom Mistral actions from the experimental
mistral-ansible-actions project &lt;a class="footnote-reference brackets" href="#id22" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; to have a Mistral workflow on the
undercloud trigger ceph-ansible to produce a working hyperconverged
overcloud.&lt;/p&gt;
&lt;p&gt;The deployer experience to stand up Ceph with TripleO at the end of
this cycle should be the following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The deployer chooses to deploy a role containing any of the
Ceph server services: CephMon, CephOSD, CephRbdMirror, CephRgw,
or CephMds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The deployer provides the same Ceph parameters they provide today
in a Heat env file, e.g. a list of OSDs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The deployer starts the deploy and gets an overcloud with Ceph&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Thus, the deployment experience remains the same for the deployer but
behind the scenes a Mistral workflow is started which triggers
ceph-ansible. The details of the Mistral workflow to accomplish this
follows.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="tripleo-ceph-deployment-via-mistral"&gt;
&lt;h3&gt;TripleO Ceph Deployment via Mistral&lt;/h3&gt;
&lt;p&gt;TripleO’s workflow to deploy a Ceph cluster would be changed so that
there are two ways to deploy a Ceph cluster; the way currently
supported by TripleO and the way described in this proposal.&lt;/p&gt;
&lt;p&gt;The workflow described here assumes the following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;A deployer chooses to deploy Ceph server services from the
following list of five services found in THT’s roles_data.yaml:
CephMon, CephOSD, CephRbdMirror, CephRgw, or CephMds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The deployer chooses to include new Heat environment files which
will be in THT when this spec is implemented. The new Heat
environment file will change the implementation of any of the five
services from the previous step. Using storage-environment.yaml,
which defaults to Ceph deployed by puppet-ceph, will still trigger
the Ceph deployment by puppet-ceph. However, if the new Heat
environment files are included instead of storage-environment.yaml,
then the implementation of the service will be done by ceph-ansible
instead; which already configures these services for hosts under
the following roles in the Ansible inventory: mons, osds, mdss,
rgws, or rbdmirrors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The undercloud has a directory called /usr/share/ceph-ansible
which contains the ceph-ansible playbooks described in this spec.
It will be present because its install will contain the
installation of the ceph-ansible package.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Mistral on the Undercloud will contain to custom actions called
&lt;cite&gt;ansible&lt;/cite&gt; and &lt;cite&gt;ansible-playbook&lt;/cite&gt; (or similar) and will also contain
the workflow for each task below and can be observed by running
&lt;cite&gt;openstack workflow list&lt;/cite&gt;. Assume this is the case because the
tripleo-common package will be modified to ship these actions and
they will be available after undercloud installation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Heat will ship a new CustomResource type like
OS::Mistral::WorflowExecution &lt;a class="footnote-reference brackets" href="#id23" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, which will execute custom
Mistral workflows.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The standard TripleO workflow, as executed by a deployer, will create
a custom Heat resource which starts an independent Mistral workflow to
interact with ceph-ansible. An example of such a Heat resource would be
OS::Mistral::WorflowExecution &lt;a class="footnote-reference brackets" href="#id23" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Each independent Mistral workflow may be implemented directly in
tripleo-common/workbooks. A separate Mistral workbook will be created
for each goal described below:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Initial deployment of OpenStack and Ceph&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adding additional Ceph OSDs to existing OpenStack and Ceph clusters&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The initial goal for the Pike cycle will be to maintain feature parity
with what is possible today in TripleO and puppet-ceph but with
containerized Ceph. Additional Mistral workflows may be written, time
permitting or in a future cycle to add new features to TripleO’s Ceph
deployment which leverage ceph-ansible playbooks to shrink the Ceph
Cluster and safely remove an OSD or to perform maintenance on the
cluster by using Ceph’s ‘noout’ flag so that the maintenance does not
result in more data migration than necessary.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="initial-deployment-of-openstack-and-ceph"&gt;
&lt;h3&gt;Initial deployment of OpenStack and Ceph&lt;/h3&gt;
&lt;p&gt;The sequence of events for this new Mistral workflow and Ceph-Ansible
to be triggered during initial deployment with TripleO follows:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Define the Overcloud on the Undercloud in Heat. This includes the
Heat parameters that are related to storage which will later be
passed to ceph-ansible via a Mistral workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt; with standard Ceph options but
including a new Heat environment file to make the implementation
of the service deployment use ceph-ansible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The undercloud assembles and uploads the deployment plan to the
undercloud Swift and Mistral environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mistral starts the workflow to deploy the Overcloud and interfaces
with Heat accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A point in the deployment is reached where the Overcloud nodes are
imaged, booted, and networked. At that point the undercloud has
access to the provisioning or management IPs of the Overcloud
nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A new Heat Resource is created which starts a Mistral workflow to
Deploy Ceph on the systems with the any of the five Ceph server
services, including CephMon, CephOSD, CephRbdMirror, CephRgw, or
CephMds &lt;a class="footnote-reference brackets" href="#id23" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The servers which host Ceph services have their relevant firewall
ports opened according to the needs of their service, e.g. the Ceph
monitor firewalls are configured to accept connections on TCP
port 6789. &lt;a class="footnote-reference brackets" href="#id24" id="id9" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Heat resource is passed the same parameters normally found in
the tripleo-heat-templates environments/storage-environment.yaml
but instead through a new Heat environment file. Additional files
may be passed to include overrides, e.g. the list of OSD disks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Heat resource passes its parameters to the Mistral workflow as
parameters. This will include information about which hosts should
have which of the five Ceph server services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Mistral workflow translates these parameters so that they match
the parameters that ceph-ansible expects, e.g.
ceph::profile::params::osds would become devices though they’d have
the same content, which would be a list of block devices. The
translation entails building an argument list that may be passed
to the playbook by calling &lt;cite&gt;ansible-playbook –extra-vars&lt;/cite&gt;.
Typically ceph-ansible uses modified files in the group_vars
directory but in this case, no files are modified and instead the
parameters are passed programmatically. Thus, the playbooks in
/usr/share/ceph-ansible may be run unaltered and that will be the
default directory. However, it will be possible to pass an
alternative location for the /usr/share/ceph-ansible playbook as
an argument. No playbooks are run yet at this stage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Mistral environment is updated to generate a new SSH key-pair
for ceph-ansible and the Overcloud nodes using the same process
that is used to create the SSH keys for TripleO validations and
install the public key on Overcloud nodes. After this environment
update it will be possible to run &lt;cite&gt;mistral environment-get
ssh_keys_ceph&lt;/cite&gt; on the undercloud and see the public and private
keys in JSON.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Mistral Action Plugin &lt;cite&gt;ansible-playbook&lt;/cite&gt; is called and passed
the list of parameters as described earlier. The dynamic ansible
inventory used by tripleo-validations is used with the &lt;cite&gt;-i&lt;/cite&gt;
option. In order for ceph-ansible to work as usual there must be a
group called &lt;cite&gt;[mons]&lt;/cite&gt; and &lt;cite&gt;[osds]&lt;/cite&gt; in the inventory. In addition to
optional groups for &lt;cite&gt;[mdss]&lt;/cite&gt;, &lt;cite&gt;[rgws]&lt;/cite&gt;,  or &lt;cite&gt;[rbdmirrors]&lt;/cite&gt;.
Modifications to the tripleo-validations project’s
tripleo-ansible-inventory script may be made to support this, or a
derivative work of the same as shipped by TripleO common. The SSH
private key for the heat-admin user and the provisioning or
management IPs of the Overcloud nodes are what Ansible will use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The mistral workflow computes the number of forks in Ansible
according to the number of machines that are going to be
bootstrapped and will pass this number with &lt;cite&gt;ansible-playbook
–forks&lt;/cite&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mistral verifies that the Ansible ping module can execute &lt;cite&gt;ansible
$group -m ping&lt;/cite&gt; for any group in mons, osds, mdss, rgws, or
rbdmirrors, that was requested by the deployer. For example, if the
deployer only specified the CephMon and CephOSD service, then
Mistral will only run &lt;cite&gt;ansible mons -m ping&lt;/cite&gt; and &lt;cite&gt;ansible osds -m
ping&lt;/cite&gt;. The Ansible ping module will SSH into each host as the
heat-admin user with key which was generated as described
previously. If this fails, then the deployment fails.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mistral starts the Ceph install using the &lt;cite&gt;ansible-playbook&lt;/cite&gt;
action.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Mistral workflow creates a Zaqar queue to send progress
information back to the client (CLI or web UI).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The workflow posts messages to the “tripleo” Zaqar queue or the
queue name provided to the original deploy workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If there is a problem during the status of the deploy may be seen
by &lt;cite&gt;openstack workflow execution list | grep ceph&lt;/cite&gt; and in the logs
at /var/log/mistral/{engine.log,executor.log}. Running &lt;cite&gt;openstack
stack resource list&lt;/cite&gt; would show the custom Heat resource that
started the Mistral workflow, but &lt;cite&gt;openstack workflow execution
list&lt;/cite&gt; and &lt;cite&gt;openstack workflow task list&lt;/cite&gt; would contain more details
about what steps completed within the Mistral workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Ceph deployment is done in containers in a way which must
prevent any configuration file conflict for any composed service,
e.g. if a Nova compute container (as deployed by TripleO) and a
Ceph OSD container are on the same node, then they must have
different ceph.conf files, even if those files have the same
content. Though, ceph-ansible will manage ceph.conf for Ceph
services and puppet-ceph will still manage ceph.conf for OpenStack
services, neither tool will both try to manage the same ceph.conf
because it will be in a different location on the container host
and bind mounted to /etc/ceph/ceph.conf within different
containers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After the Mistral workflow is completed successfully, the custom
Heat resource is considered successfully created. If the Mistral
workflow does not complete successfully, then the Heat resource
is not considered successfully created. TripleO should handle this
the same way that it handles any Heat resource that failed to be
created. For example, because the workflow is idempotent, if the
resource creation fails because the wrong parameter was passed or
because of a temporary network issue, the deployer could simply run
a stack-update the Mistral worklow would run again and if the
issues which caused the first run to fail were resolved, the
deployment should succeed. Similarly if a user updates a parameter,
e.g. a new disk is added to &lt;cite&gt;ceph::profile::params::osds&lt;/cite&gt;, then the
workflow will run again without breaking the state of the running
Ceph cluster but it will configure the new disk.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After the dependency of the previous step is satisfied, the TripleO
Ceph external Heat resource is created to configure the appropriate
Overcloud nodes as Ceph clients.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the CephRGW service, hieradata will be emitted so that it may
be used for the haproxy listener setup and keystone users setup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Overcloud deployment continues as if it was using an external
Ceph cluster.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="adding-additional-ceph-osd-nodes-to-existing-openstack-and-ceph-clusters"&gt;
&lt;h3&gt;Adding additional Ceph OSD Nodes to existing OpenStack and Ceph clusters&lt;/h3&gt;
&lt;p&gt;The process to add an additional Ceph OSD node is similar to the
process to deploy the OSDs along with the Overcloud:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Introspect the new hardware to host the OSDs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the Heat environment file containing the node counts, increment
the CephStorageCount.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run &lt;cite&gt;openstack overcloud deploy&lt;/cite&gt; with standard Ceph options and the
environment file which specifies the implementation of the Ceph
deployment via ceph-ansible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The undercloud updates the deployment plan.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mistral starts the workflow to update the Overcloud and interfaces
with Heat accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A point in the deployment is reached where the new Overcloud nodes
are imaged, booted, and networked. At that point the undercloud has
access to the provisioning or management IPs of the Overcloud
nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A new Heat Resource is created which starts a Mistral workflow to
add new Ceph OSDs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TCP ports 6800:7300 are opened on the OSD host &lt;a class="footnote-reference brackets" href="#id24" id="id10" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Mistral environment already has an SSH key-pair as described in
the initial deployment scenario. The same process that is used to
install the public SSH key on Overcloud nodes for TripleO
validations is used to install the SSH keys for ceph-ansible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If necessary, the Mistral workflow updates the number of forks in
Ansible according to the new number of machines that are going to
be bootstrapped.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The dynamic Ansible inventory will contain the new node.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mistral confirms that Ansible can execute &lt;cite&gt;ansible osds -m ping&lt;/cite&gt;.
This causes Ansible to SSH as the heat-admin user into all of the
CephOsdAnsible nodes, including the new nodes. If this fails, then
the update fails.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mistral uses the Ceph variables found in Heat as described in the
initial deployment scenario.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mistral runs the osd-configure.yaml playbook from ceph-ansible to
add the extra Ceph OSD server.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The OSDs on the server are each deployed in their own containers
and &lt;cite&gt;docker ps&lt;/cite&gt; will list each OSD container.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After the Mistral workflow is completed, the Custom Heat resource
is considered to be updated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No changes are necessary for the TripleO Ceph external Heat
resource since the Overcloud Ceph clients only need information
about new OSDs from the Ceph monitors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Overcloud deployment continues as if it was using an external
Ceph cluster.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="containerization-of-configuration-files"&gt;
&lt;h3&gt;Containerization of configuration files&lt;/h3&gt;
&lt;p&gt;As described in the Containerize TripleO spec, configuration files
for the containerized service will be generated by Puppet and then
passed to the containerized service using a configuration volume &lt;a class="footnote-reference brackets" href="#id25" id="id11" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
A similar containerization feature is already supported by
ceph-ansible, which uses the following sequence to generate the
ceph.conf configuration file.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Ansible generates a ceph.conf on a monitor node&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ansible runs the monitor container and bindmount /etc/ceph&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No modification is being done in the ceph.conf&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ansible copies the ceph.conf to the Ansible server&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ansible copies the ceph.conf and keys to the appropriate machine&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ansible runs the OSD container and bindmount /etc/ceph&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No modification is being done in the ceph.conf&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These similar processes are compatible, even in the case of container
hosts which run more than one OpenStack service but which each need
their own copy of the configuration file per container. For example,
consider a containerzation node which hosts both Nova compute and Ceph
OSD services. In this scenario, the Nova compute service would be a
Ceph client and puppet-ceph would generate its ceph.conf and the Ceph
OSD service would be a Ceph server and ceph-ansible would generate its
ceph.conf. It is necessary for Puppet to configure the Ceph client
because Puppet configures the other OpenStack related configuration
files as is already provided by TripleO. Both generated ceph.conf
files would need to be stored in a separate directory on the
containerization hosts to avoid conflicts and the directories could be
mapped to specific containers. For example, host0 could have the
following versions of foo.conf for two different containers:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;host0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;container1&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;---&lt;/span&gt; &lt;span class="n"&gt;generated&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;host0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;container2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;---&lt;/span&gt; &lt;span class="n"&gt;generated&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When each container is started on the host, the different
configuration files could then be mapped to the different containers:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;docker&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="n"&gt;containter1&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;container1&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;
&lt;span class="n"&gt;docker&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="n"&gt;containter2&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;container2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In the above scenario, it is necessary for both configuration files
to be generated from the same parameters. I.e. both Puppet and Ansible
will use the same values from the Heat environment file, but will
generate the configuration files differently. After the configuration
programs have run it won’t matter that Puppet idempotently updated
lines of the ceph.conf and that Ansible used a Jina2 template. What
will matter is that both configuration files have the same value,
e.g. the same FSID.&lt;/p&gt;
&lt;p&gt;Configuration files generated as described in the Containerize TripleO
spec will not store those configuration files on the container
host’s /etc directory before passing it to the container guest with a
bind mount. By default, ceph-ansible generates the initial ceph.conf
on the container host’s /etc directory before it uses a bind mount to
pass it through to the container. In order to be consistent with the
Containerize TripleO spec, ceph-ansible will get a new feature for
deploying Ceph in containers so that it will not generate the
ceph.conf on the container host’s /etc directory. The same option will
need to apply when generating Ceph key rings; which will be stored in
/etc/ceph in the container, but not on the container host.&lt;/p&gt;
&lt;p&gt;Because Mistral on the undercloud runs the ansible playbooks, the
user “mistral” on the undercloud will be the one that SSH’s into the
overcloud nodes to run ansible playbooks. Care will need to be taken
to ensure that user doesn’t make changes which are out of scope.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;From a high level, this proposal is an alternative to the current
method of deploying Ceph with TripleO and offers the benefits listed
in the problem description.&lt;/p&gt;
&lt;p&gt;From a lower level, how this proposal is implemented as described in
the Workflow section should be considered.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;In a split-stack scenario, after the hardware has been provisioned
by the first Heat stack and before the configuration Heat stack is
created, a Mistral workflow like the one in the POC &lt;a class="footnote-reference brackets" href="#id21" id="id12" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; could be
run to configured Ceph on the Ceph nodes. This scenario would be
more similar to the one where TripleO is deployed using the TripleO
Heat Templates environment file puppet-ceph-external.yaml. This
could be an alternative to a new OS::Mistral::WorflowExecution Heat
resource &lt;a class="footnote-reference brackets" href="#id23" id="id13" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Trigger the ceph-ansible deployment before the OpenStack deployment
In the initial workflow section, it is proposed that “A new
Heat Resource is created which starts a Mistral workflow to Deploy
Ceph”. This may be difficult because, in general, composable services
currently define snippets of puppet data which is then later combined
to define the deployment steps, and there is not yet a way to support
running an arbitrary Mistral workflow at a given step of a deployment.
Thus, the Mistral workflow could be started first and then it could
wait for what is described in step 6 of the overview section.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A new SSH key pair will be created on the undercloud and will be
accessible in the Mistral environment via a command like
&lt;cite&gt;mistral environment-get ssh_keys_ceph&lt;/cite&gt;. The public key of this
pair will be installed in the heat-admin user’s authorized_keys
file on all Overcloud nodes which will be Ceph Monitors or OSDs.
This process will follow the same pattern used to create the SSH
keys used for TripleO validations so nothing new would happen in
that respect; just another instance on the same type of process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An additional tool would do configuration on the Overcloud, though
the impact of this should be isolated via Containers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Regardless of how Ceph services are configured, they require changes
to the firewall. This spec will implement parity in fire-walling for
Ceph services &lt;a class="footnote-reference brackets" href="#id24" id="id14" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The following applies to the undercloud:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Mistral will need to run an additional workflow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Heat’s role in deploying Ceph would be lessened so the Heat stack
would be smaller.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Ceph will be deployed using a method that is proven but who’s
integration is new to TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;fultonj&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;gfidente
leseb
colonwq
d0ugal (to review Mistral workflows/actions)&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Prototype a Mistral workflow to independently install Ceph on
Overcloud nodes &lt;a class="footnote-reference brackets" href="#id21" id="id15" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. [done]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Prototype a Heat Resource to start an independent Mistral Workflow
&lt;a class="footnote-reference brackets" href="#id23" id="id16" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. [done]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Expand mistral-ansible-actions with necessary options (fultonj)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parametize mistral workflow (fultonj)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update and have merged Heat CustomResource &lt;a class="footnote-reference brackets" href="#id23" id="id17" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; (gfidente)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Have ceph-ansible create openstack pools and keys for containerized
deployments: &lt;a class="reference external" href="https://github.com/ceph/ceph-ansible/issues/1321"&gt;https://github.com/ceph/ceph-ansible/issues/1321&lt;/a&gt; (leseb)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;get ceph-ansible packaged in ceph.com and push to centos cbs
(fultonj / leseb)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make undercloud install produce /usr/share/ceph-ansible by modifying
RDO’s instack RPM’s spec file to add a dependency (fultonj)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Submit mistral workflow and ansible-mistral-actions to
tripleo-common (fultonj)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Prototype new service plugin interface that defines per-service
workflows (gfidente / shardy / fultonj)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Submit new services into tht/roles_data.yaml so users can use it.
This should include a change to the tripleo-heat-templates
ci/environments/scenario001-multinode.yaml to include the new
service, e.g. CephMonAnsible so that CI is tested. This may not
work unless it all co-exists in a single overcloud deploy.
If it works, we use it to get started. The initial plan is for
scenario004 to keep using puppet-ceph.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement the deleting the Ceph Cluster scenario&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement the adding additional Ceph OSDs to existing OpenStack and
Ceph clusters scenario&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement the removing Ceph OSD nodes scenario&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement the performing maintenance on Ceph OSD nodes (optional)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Containerization of the Ceph services provided by ceph-ansible is
used to ensure the configuration tools aren’t competing. This
will need to be compatible with the Containerize TripleO spec
&lt;a class="footnote-reference brackets" href="#id26" id="id18" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;A change to tripleo-heat-templates’ scenario001-multinode.yaml will be
submitted which includes deployment of the new services CephMonAnsible
and CephOsdAnsible (note that these role names will be changed when
fully working). This testing scenario may not work unless all of the
services may co-exist; however, preliminary testing indicates that
this will work. Initially scenario004 will not be modified and will be
kept using puppet-ceph. We may start by changing ovb-nonha scenario
first as we believe this may be faster. When the CI move to
tripleo-quickstart happens and there is a containers only scenario we
will want to add a hyperconverged containerized deployment too.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;A new TripleO Backend Configuration document “Deploying Ceph with
ceph-ansible” would be required.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id19" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id3"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/ceph/ceph-ansible"&gt;ceph-ansible&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id20" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/Tendrl/documentation"&gt;Tendrl&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id21" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id4"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id12"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id15"&gt;3&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/fultonj/tripleo-ceph-ansible"&gt;POC tripleo-ceph-ansible&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id22" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/d0ugal/mistral-ansible-actions"&gt;Experimental mistral-ansible-actions project&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id23" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id6"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id7"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id8"&gt;3&lt;/a&gt;,&lt;a role="doc-backlink" href="#id13"&gt;4&lt;/a&gt;,&lt;a role="doc-backlink" href="#id16"&gt;5&lt;/a&gt;,&lt;a role="doc-backlink" href="#id17"&gt;6&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/420664"&gt;Proposed new Heat resource OS::Mistral::WorflowExecution&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id24" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id9"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id10"&gt;2&lt;/a&gt;,&lt;a role="doc-backlink" href="#id14"&gt;3&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;cite&gt;These firewall changes must be managed in a way that does not conflict with TripleO’s mechanism for managing host firewall rules and should be done before the Ceph servers are deployed. We are working on a solution to this problem.&lt;/cite&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id25" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id11"&gt;8&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/416421/29/docker/docker-puppet.py"&gt;Configuration files generated by Puppet and passed to a containerized service via a config volume&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id26" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id18"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/223182"&gt;Spec to Containerize TripleO&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Fri, 10 Apr 2020 00:00:00 </pubDate></item><item><title>Deriving TripleO Parameters</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/tripleo-derive-parameters.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-derive-parameters"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-derive-parameters&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This specification proposes a generic interface for automatically
populating environment files with parameters which were derived from
formulas; where the formula’s input came from introspected hardware
data, workload type, and deployment type. It also provides specific
examples of how this interface may be used to improve deployment of
overclouds to be used in DPDK or HCI usecases. Finally, it proposes
how this generic interface may be shared and extended by operators
who optionally chose to have certain parameters prescribed so that
future systems tuning expertise may be integrated into TripleO.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Operators must populate parameters for a deployment which may be
specific to hardware and deployment type. The hardware information
of a node is available to the operator once the introspection of the
node is completed. However, the current process requires that the
operator manually read the introspected data, make decisions based on
that data and then update the parameters in an environment file. This
makes deployment preparation unnecessarily complex.&lt;/p&gt;
&lt;p&gt;For example, when deploying for DPDK, the operator must provide the
list of CPUs which should be assigned to the DPDK Poll Mode Driver
(PMD) and the CPUs should be provided from the same NUMA node on which
the DPDK interface is present. In order to provide the correct
parameters, the operator must cross check all of these details.&lt;/p&gt;
&lt;p&gt;Another example is the deployment of HCI overclouds, which run both
Nova compute and Ceph OSD services on the same nodes. In order to
prevent contention between compute and storage services, the operator
may manually apply formulas, provided by performance tuning experts,
which take into account available hardware, type of workload, and type
of deployment, and then after computing the appropriate parameters
based on those formulas, manually store them in environment files.&lt;/p&gt;
&lt;p&gt;In addition to the complexity of the DPDK or HCI usecase, knowing the
process to assign CPUs to the DPDK Poll Mode Driver or isolate compute
and storage resources for HCI is, in itself, another problem. Rather
than document the process and expect operators to follow it, the
process should be captured in a high level language with a generic
interface so that performance tuning experts may easily share new
similar processes for other use cases with operators.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;This spec aims to make three changes to TripleO outlined below.&lt;/p&gt;
&lt;section id="mistral-workflows-to-derive-parameters"&gt;
&lt;h3&gt;Mistral Workflows to Derive Parameters&lt;/h3&gt;
&lt;p&gt;A group of Mistral workflows will be added for the features which are
complex to determine the deployment parameters. Features like DPDK,
SR-IOV and HCI require, input from the introspection data to be
analyzed to compute the deployment parameters. This derive parameters
workflow will provide a default set of computational formulas by
analyzing the introspected data. Thus, there will be a hard dependency
with node introspection for this workflow to be successful.&lt;/p&gt;
&lt;p&gt;During the first iterations, all the roles in a deployment will be
analyzed to find a service associated with the role, which requires
parameter derivation. Various options of using this and the final
choice for the current iteration is discussed in below section
&lt;a class="reference internal" href="#workflow-association-with-services"&gt;Workflow Association with Services&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This workflow assumes that all the nodes in a role have a homegenous
hardware specification and introspection data of the first node will
be used for processing the parameters for the entire role. This will
be reexamined in later iterations, based on the need for node specific
derivations. The workflow will consider the flavor-profile association
and nova placement scheduler to identify the nodes associated with a
role.&lt;/p&gt;
&lt;p&gt;Role-specific parameters are an important requirement for this workflow.
If there are multiple roles with the same service (feature) enabled,
the parameters which are derived from this workflow will be applied
only on the corresponding role.&lt;/p&gt;
&lt;p&gt;The input sources for these workflows are the ironic database and ironic
introspection data stored in Swift, in addition to the Deployment plan stored
in Swift. Computations done to derive the parameters within the Mistral
workflow will be implemented in YAQL. These computations will be a separate
workflow on per feature basis so that the formulas can be customizable. If an
operator has to modify the default formulas, he or she has to update only this
workflow with customized formula.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="applying-derived-parameters-to-the-overcloud"&gt;
&lt;h3&gt;Applying Derived Parameters to the Overcloud&lt;/h3&gt;
&lt;p&gt;In order for the resulting parameters to be applied to the overcloud,
the deployment plan, which is stored in Swift on the undercloud,
will be modified with the Mistral &lt;cite&gt;tripleo.parameters.update&lt;/cite&gt; action
or similar.&lt;/p&gt;
&lt;p&gt;The methods for providing input for derivation and the update of
parameters which are derivation output should be consistent with the
Deployment Plan Management specification &lt;a class="footnote-reference brackets" href="#id7" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;. The implementation of
this spec with respect to the interfaces to set and get parameters may
change as it is updated. However, the basic workflow should remain the
same.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="trigger-mistral-workflows-with-tripleo"&gt;
&lt;h3&gt;Trigger Mistral Workflows with TripleO&lt;/h3&gt;
&lt;p&gt;Assuming that workflows are in place to derive parameters and update the
deployment plan as described in the previous two sections, an operator may
take advantage of this optional feature by enabling it via &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;plan-&lt;/span&gt;
&lt;span class="pre"&gt;environment.yaml&lt;/span&gt;&lt;/code&gt;. A new section &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;workflow_parameters&lt;/span&gt;&lt;/code&gt; will be added to
the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;plan-environments.yaml&lt;/span&gt;&lt;/code&gt; file to accomodate the additional parameters
required for executing workflows. With this additional section, we can ensure
that the workflow specific parameters are provide only to the workflow,
without polluting the heat environments. It will also be possible to provide
multiple plan environment files which will be merged in the CLI before plan
creation.&lt;/p&gt;
&lt;p&gt;These additional parameters will be read by the derive params workflow
directly from the merged &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;plan-environment.yaml&lt;/span&gt;&lt;/code&gt; file stored in Swift.&lt;/p&gt;
&lt;p&gt;It is possible to modify the created plan or modify the profile-node
association, after the derive parameters workflow execution. As of
now, we assume that there no such alterations done, but it will be
extended after the initial iteration, to fail the deployment with
some validations.&lt;/p&gt;
&lt;p&gt;An operator should be able to derive and view parameters without doing a
deployment; e.g. “generate deployment plan”. If the calculation is done as
part of the plan creation, it would be possible to preview the calculated
values. Alternatively the workflow could be run independently of the overcloud
deployment, but how that will fit with the UI workflow needs to be determined.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="usecase-1-derivation-of-dpdk-parameters"&gt;
&lt;h2&gt;Usecase 1: Derivation of DPDK Parameters&lt;/h2&gt;
&lt;p&gt;A part of the Mistral workflow which uses YAQL to derive DPDK
parameters based on introspection data, including NUMA &lt;a class="footnote-reference brackets" href="#id8" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;, exists
and may be seen on GitHub &lt;a class="footnote-reference brackets" href="#id9" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="usecase-2-derivation-profiles-for-hci"&gt;
&lt;h2&gt;Usecase 2: Derivation Profiles for HCI&lt;/h2&gt;
&lt;p&gt;This usecase uses HCI, running Ceph OSD and Nova Compute on the same node. HCI
derive parameters workflow works with a default set of configs to categorize
the type of the workload that the role will host. An option will be provide to
override the default configs with deployment specific configs via &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;plan-&lt;/span&gt;
&lt;span class="pre"&gt;environment.yaml&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In case of HCI deployment, the additional plan environment used for the
deployment will look like:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;workflow_parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;workflows&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;derive_parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# HCI Derive Parameters&lt;/span&gt;
    &lt;span class="n"&gt;HciProfile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;nfv&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt;
    &lt;span class="n"&gt;HciProfileConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;average_guest_memory_size_in_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;
        &lt;span class="n"&gt;average_guest_CPU_utilization_percentage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
      &lt;span class="n"&gt;many_small_vms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;average_guest_memory_size_in_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;
        &lt;span class="n"&gt;average_guest_CPU_utilization_percentage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
      &lt;span class="n"&gt;few_large_vms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;average_guest_memory_size_in_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;
        &lt;span class="n"&gt;average_guest_CPU_utilization_percentage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
      &lt;span class="n"&gt;nfv_default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;average_guest_memory_size_in_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;
        &lt;span class="n"&gt;average_guest_CPU_utilization_percentage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In the above example, the section &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;workflow_parameters&lt;/span&gt;&lt;/code&gt; is used to provide
input parameters for the workflow in order to isolate Nova and Ceph
resources while maximizing performance for different types of guest
workloads. An example of the derivation done with these inputs is
provided in nova_mem_cpu_calc.py on GitHub &lt;a class="footnote-reference brackets" href="#id10" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-integration-of-parameter-derivation-with-tripleo"&gt;
&lt;h2&gt;Other Integration of Parameter Derivation with TripleO&lt;/h2&gt;
&lt;section id="users-may-still-override-parameters"&gt;
&lt;h3&gt;Users may still override parameters&lt;/h3&gt;
&lt;p&gt;If a workflow derives a parameter, e.g. cpu_allocation_ratio, but the
operator specified a cpu_allocation_ratio in their overcloud deploy,
then the operator provided value is given priority over the derived
value. This may be useful in a case where an operator wants all of the
values that were derived but just wants to override a subset of those
parameters.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="handling-cross-dependency-resources"&gt;
&lt;h3&gt;Handling Cross Dependency Resources&lt;/h3&gt;
&lt;p&gt;It is possible that multiple workflows will end up deriving parameters based
on the same resource (like CPUs). When this happens, it is important to have a
specific order for the workflows to be run considering the priority.&lt;/p&gt;
&lt;p&gt;For example, let us consider the resource CPUs and how it should be used
between DPDK and HCI. DPDK requires a set of dedicated CPUs for Poll Mode
Drivers (NeutronDpdkCoreList), which should not be used for host process
(ComputeHostCpusList) and guest VM’s (NovaVcpuPinSet). HCI requires the CPU
allocation ratio to be derived based on the number of CPUs that are available
for guest VMs (NovaVcpuPinSet). Priority is given to DPDK, followed by HOST
parameters and then HCI parameters. In this case, the workflow execution
starts with a pool of CPUs, then:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;DPDK: Allocate NeutronDpdkCoreList&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HOST: Allocate ComputeHostCpusList&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HOST: Allocate NovaVcpuPinSet&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HCI: Fix the cpu allocation ratio based on NovaVcpuPinSet&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="derived-parameters-for-specific-services-or-roles"&gt;
&lt;h3&gt;Derived parameters for specific services or roles&lt;/h3&gt;
&lt;p&gt;If an operator only wants to configure Enhanced Placement Awareness (EPA)
features like CPU pinning or huge pages, which are not associated with any
feature like DPDK or HCI, then it should be associated with just the compute
service.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="workflow-association-with-services"&gt;
&lt;h3&gt;Workflow Association with Services&lt;/h3&gt;
&lt;p&gt;The optimal way to associate the derived parameter workflows with
services, is to get the list of the enabled services on a given role,
by previewing Heat stack. With the current limitations in Heat, it is
not possible fetch the enabled services list on a role. Thus, a new
parameter will be introduced on the service which is associated with a
derive parameters workflow. If this parameter is referenced in the
heat resource tree, on a specific role, then the corresponding derive
parameter workflow will be invoked. For example, the DPDK service will
have a new parameter “EnableDpdkDerivation” to enable the DPDK
specific workflows.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="future-integration-with-tripleo-ui"&gt;
&lt;h3&gt;Future integration with TripleO UI&lt;/h3&gt;
&lt;p&gt;If this spec were implemented and merged, then the TripleO UI could
have a menu item for a deployment, e.g. HCI, in which the deployer may
choose a derivation profile and then deploy an overcloud with that
derivation profile.&lt;/p&gt;
&lt;p&gt;The UI could better integrate with this feature by allowing a deployer
to use a graphical slider to vary an existing derivation profile and
then save that derivation profile with a new name. The following
cycle could be used by the deployer to tune the overcloud.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Choose a deployment, e.g. HCI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Choose an HCI profile, e.g. many_small_vms&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Benchmark the planned workload on the deployed overcloud&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the sliders to change aspects of the derivation profile&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the deployment and re-run the benchmark&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Repeat as needed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Save the new derivation profile as the one to be deployed in the field&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The implementation of this spec would enable the TripleO UI to support
the above.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The simplest alternative is for operators to determine what tunings
are appropriate by testing or reading documentation and then implement
those tunings in the appropriate Heat environment files. For example,
in an HCI scenario, an operator could run nova_mem_cpu_calc.py &lt;a class="footnote-reference brackets" href="#id10" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;
and then create a Heat environment file like the following with its
output and then deploy the overcloud and directly reference this
file:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;parameter_defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;ExtraConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;reserved_host_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;75000&lt;/span&gt;
    &lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cpu_allocation_ratio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;8.2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This could translate into a variety of overrides which would require
initiative on the operator’s part.&lt;/p&gt;
&lt;p&gt;Another alternative is to write separate tools which generate the
desired Heat templates but don’t integrate them with TripleO. For
example, nova_mem_cpu_calc.py and similar, would produce a set of Heat
environment files as output which the operator would then include
instead of output containing the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;nova.conf reserved_host_memory_mb = 75000 MB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;nova.conf cpu_allocation_ratio = 8.214286&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When evaluating the above, keep in mind that only two parameters for
CPU allocation and memory are being provided as an example, but that
a tuned deployment may contain more.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;There is no security impact from this change as it sits at a higher
level to automate, via Mistral and Heat, features that already exist.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Operators need not manually derive the deployment parameters based on the
introspection or hardware specification data, as it is automatically derived
with pre-defined formulas.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The deployment and update of an overcloud may take slightly longer if
an operator uses this feature because an additional Mistral workflow
needs to run to perform some analytics before applying configuration
updates. However, the performance of the overcloud would be improved
because this proposal aims to make it easier to tune the overcloud for
performance.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;A new configuration option is being added, but it has to be explicitly
enabled, and thus it would not take immediate effect after its merged.
Though, if a deployer chooses to use it and there is a bug in it, then
it could affect the overcloud deployment. If a deployer uses this new
option, and had a deploy in which they set a parameter directly,
e.g. the Nova cpu_allocation_ratio, then that parameter may be
overridden by a particular tuning profile. So that is something a
deployer should be aware of when using this proposed feature.&lt;/p&gt;
&lt;p&gt;The config options being added will ship with a variety of defaults
based on deployments put under load in a lab. The main idea is to make
different sets of defaults, which were produced under these
conditions, available. The example discussed in this proposal and to
be made available on completion could be extended.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;This spec proposes modifying the deployment plan which, if there was a
bug, could introduce problems into a deployment. However, because the
new feature is completely optional, a developer could easily disable
it.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignees:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;skramaja
fultonj&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jpalanis
abishop
shardy
gfidente&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Derive Params start workflow to find list of roles&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Workflow run for each role to fetch the introspection data and trigger
individual features workflow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Workflow to identify if a service associated with a features workflow is
enabled in a role&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DPDK Workflow: Analysis and concluding the format of the input data (jpalanis)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DPDK Workflow: Parameter deriving workflow (jpalanis)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HCI Workflow: Run a workflow that calculates the parameters (abishop)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SR-IOV Workflow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EPA Features Workflow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the derive params workflow from CLI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add CI scenario testing if workflow with produced expected output&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;NUMA Topology in introspection data (ironic-python-agent) &lt;a class="footnote-reference brackets" href="#id11" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Create a new scenario in the TripleO CI in which a deployment is done
using all of the available options within a derivation profile called
all-derivation-options. A CI test would need to be added that would
test this new feature by doing the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A deployment would be done with the all-derivation-options profile&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The deployment would be checked that all of the configurations had been made&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the configuration changes are in place, then the test passed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Else the test failed&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Relating the above to the HCI usecase, the test could verify one of
two options:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;A Heat environment file created with the following syntactically
valid Heat:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;parameter_defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;ExtraConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;reserved_host_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;75000&lt;/span&gt;
    &lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cpu_allocation_ratio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;8.2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The compute node was deployed such that the commands below return
something like the following:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="nd"&gt;@overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;osd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="c1"&gt;# grep reserved_host_memory /etc/nova/nova.conf&lt;/span&gt;
&lt;span class="n"&gt;reserved_host_memory_mb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;75000&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="nd"&gt;@overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;osd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="c1"&gt;# grep cpu_allocation_ratio /etc/nova/nova.conf&lt;/span&gt;
&lt;span class="n"&gt;cpu_allocation_ratio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;8.2&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="nd"&gt;@overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;osd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Option 1 would put less load on the CI infrastructure and produce a
faster test but Option 2 tests the full scenario.&lt;/p&gt;
&lt;p&gt;If a new derived parameter option is added, then the all-derivation-options
profile would need to be updated and the test would need to be updated
to verify that the new options were set.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;A new chapter would be added to the TripleO document on deploying with
derivation profiles.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id7" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/438918"&gt;Deployment Plan Management specification&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id8" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/396147"&gt;Spec for Ironic to retrieve NUMA node info&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id9" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/Jaganathancse/Jagan/tree/master/mistral-workflow"&gt;https://github.com/Jaganathancse/Jagan/tree/master/mistral-workflow&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id10" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id4"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id5"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/RHsyseng/hci/blob/master/scripts/nova_mem_cpu_calc.py"&gt;nova_mem_cpu_calc.py&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id11" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id6"&gt;5&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/424729/"&gt;NUMA Topology in introspection data (ironic-python-agent)&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Fri, 10 Apr 2020 00:00:00 </pubDate></item><item><title>Tech Debt Tracking</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/policy/tech-debt-tracking.html</link><description>
 
&lt;section id="goal"&gt;
&lt;h2&gt;Goal&lt;/h2&gt;
&lt;p&gt;Provide a basic policy for tracking and being able to reference tech debt
related changes in TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;During the development of TripleO, sometimes tech debt is acquired due to time
or resource constraints that may exist. Without a solid way of tracking when
we intentially add tech debt, it is hard to quantify how much tech debt is
being self inflicted. Additionally tech debt gets lost in the code and without
a way to remember where we left it, it is almost impossible to remember when
and where we need to go back to fix some known issues.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="tracking-code-tech-debt-with-bugs"&gt;
&lt;h3&gt;Tracking Code Tech Debt with Bugs&lt;/h3&gt;
&lt;p&gt;Intentionally created tech debt items should have a bug &lt;a class="footnote-reference brackets" href="#id2" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; created with the
&lt;cite&gt;tech-debt&lt;/cite&gt; tag added to it. Additionally the commit message of the change
should reference this &lt;cite&gt;tech-debt&lt;/cite&gt; bug and if possible a comment should be added
into the code referencing who put it in there.&lt;/p&gt;
&lt;p&gt;Example Commit Message:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;Always&lt;/span&gt; &lt;span class="n"&gt;exit&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;because&lt;/span&gt; &lt;span class="n"&gt;foo&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;currently&lt;/span&gt; &lt;span class="n"&gt;broken&lt;/span&gt;

&lt;span class="n"&gt;We&lt;/span&gt; &lt;span class="n"&gt;need&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;always&lt;/span&gt; &lt;span class="n"&gt;exit&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;because&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;foo&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt; &lt;span class="n"&gt;eroneously&lt;/span&gt; &lt;span class="n"&gt;returns&lt;/span&gt;
&lt;span class="mf"&gt;42.&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="n"&gt;bug&lt;/span&gt; &lt;span class="n"&gt;has&lt;/span&gt; &lt;span class="n"&gt;been&lt;/span&gt; &lt;span class="n"&gt;reported&lt;/span&gt; &lt;span class="n"&gt;upstream&lt;/span&gt; &lt;span class="n"&gt;but&lt;/span&gt; &lt;span class="n"&gt;we&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;sure&lt;/span&gt; &lt;span class="n"&gt;when&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;
&lt;span class="n"&gt;will&lt;/span&gt; &lt;span class="n"&gt;be&lt;/span&gt; &lt;span class="n"&gt;addressed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;Related&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Bug&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;#1234567&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Example Comment:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="c1"&gt;# TODO(aschultz): We need this because the world is falling apart LP#1234567&lt;/span&gt;
&lt;span class="n"&gt;foo&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;exit&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="triaging-bugs-as-tech-debt"&gt;
&lt;h3&gt;Triaging Bugs as Tech Debt&lt;/h3&gt;
&lt;p&gt;If an end user reports a bug that we know is a tech debt item, the person
triaging the bug should add the &lt;cite&gt;tech-debt&lt;/cite&gt; tag to the bug.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="reporting-tech-debt"&gt;
&lt;h3&gt;Reporting Tech Debt&lt;/h3&gt;
&lt;p&gt;With the &lt;cite&gt;tech-debt&lt;/cite&gt; tag on bugs, we should be able to keep a running track
of the bugs we have labeled and should report on this every release milestone
to see trends around how much is being added and when. As part of our triaging
of bugs, we should strive to add net-zero tech-debt bugs each major release if
possible.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We continue to not track any of these things and continue to rely on developers
to remember when they add code and circle back around to fix it themselves or
when other developers find the issue and remove it.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Core reviewers should request that any tech debt be appropriately tracked and
feel free to -1 any patches that are adding tech debt without proper
attribution.&lt;/p&gt;
&lt;section id="author-s"&gt;
&lt;h3&gt;Author(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary author:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;aschultz&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="milestones"&gt;
&lt;h3&gt;Milestones&lt;/h3&gt;
&lt;p&gt;Queens-1&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;aschultz to create tech-debt tag in Launchpad.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id2" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/tripleo-docs/latest/contributor/contributions.html#reporting-bugs"&gt;https://docs.openstack.org/tripleo-docs/latest/contributor/contributions.html#reporting-bugs&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;section id="revision-history"&gt;
&lt;h2&gt;Revision History&lt;/h2&gt;
&lt;table class="docutils align-default" id="id3"&gt;
&lt;caption&gt;&lt;span class="caption-text"&gt;Revisions&lt;/span&gt;&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Release Name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;Queens&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Introduced&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0
Unported License.
&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Fri, 10 Apr 2020 00:00:00 </pubDate></item><item><title>Fast-forward upgrades</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/queens/fast-forward-upgrades.html</link><description>&lt;dl&gt;
&lt;dt&gt;.&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0 Unported
License.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;section id="fast-forward-upgrades"&gt;
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/fast-forward-upgrades"&gt;https://blueprints.launchpad.net/tripleo/+spec/fast-forward-upgrades&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Fast-forward upgrades are upgrades that move an environment from release &lt;cite&gt;N&lt;/cite&gt; to
&lt;cite&gt;N+X&lt;/cite&gt; in a single step, where &lt;cite&gt;X&lt;/cite&gt; is greater than &lt;cite&gt;1&lt;/cite&gt; and for fast-forward
upgrades is typically &lt;cite&gt;3&lt;/cite&gt;. This spec outlines how such upgrades can be
orchestrated by TripleO between the Newton and Queens OpenStack releases.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;OpenStack upgrades are often seen by operators as problematic &lt;a class="footnote-reference brackets" href="#id11" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; &lt;a class="footnote-reference brackets" href="#id12" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.
Whilst TripleO upgrades have improved greatly over recent cycles many operators
are still reluctant to upgrade with each new release.&lt;/p&gt;
&lt;p&gt;This often leads to a situation where environments remain on the release used
when first deployed. Eventually this release will come to the end of its
supported life (EOL), forcing operators to upgrade to the next supported
release. There can also be restrictions imposed on an environment that simply
do not allow upgrades to be performed ahead of the EOL of a given release,
forcing operators to again wait until the release hits EOL.&lt;/p&gt;
&lt;p&gt;While it is possible to then linearly upgrade to a supported release with the
cadence of upstream releases, downstream distributions providing long-term
support (LTS) releases may not be able to provide the same path once the
initially installed release reaches EOL. Operators in such a situation may also
want to avoid running multiple lengthy linear upgrades to reach their desired
release.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;TripleO support for fast-forward upgrades will first target &lt;cite&gt;N&lt;/cite&gt; to &lt;cite&gt;N+3&lt;/cite&gt;
upgrades between the Newton and Queens releases:&lt;/p&gt;
&lt;div class="highlight-bash notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;Newton    Ocata     Pike       Queens
+-----+   +-----+   +-----+    +-----+
&lt;span class="p"&gt;|&lt;/span&gt;     &lt;span class="p"&gt;|&lt;/span&gt;   &lt;span class="p"&gt;|&lt;/span&gt; N+1 &lt;span class="p"&gt;|&lt;/span&gt;   &lt;span class="p"&gt;|&lt;/span&gt; N+2 &lt;span class="p"&gt;|&lt;/span&gt;    &lt;span class="p"&gt;|&lt;/span&gt;     &lt;span class="p"&gt;|&lt;/span&gt;
&lt;span class="p"&gt;|&lt;/span&gt;  N  &lt;span class="p"&gt;|&lt;/span&gt; ---------------------&amp;gt; &lt;span class="p"&gt;|&lt;/span&gt; N+3 &lt;span class="p"&gt;|&lt;/span&gt;
&lt;span class="p"&gt;|&lt;/span&gt;     &lt;span class="p"&gt;|&lt;/span&gt;   &lt;span class="p"&gt;|&lt;/span&gt;     &lt;span class="p"&gt;|&lt;/span&gt;   &lt;span class="p"&gt;|&lt;/span&gt;     &lt;span class="p"&gt;|&lt;/span&gt;    &lt;span class="p"&gt;|&lt;/span&gt;     &lt;span class="p"&gt;|&lt;/span&gt;
+-----+   +-----+   +-----+    +-----+
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This will give the impression of the Ocata and Pike releases being skipped with
the fast-forward upgrade moving the environment from Newton to Queens. In
reality as OpenStack projects with the &lt;cite&gt;supports-upgrade&lt;/cite&gt; tag are only required
to support &lt;cite&gt;N&lt;/cite&gt; to &lt;cite&gt;N+1&lt;/cite&gt; upgrades &lt;a class="footnote-reference brackets" href="#id13" id="id3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; the upgrade will still need to move
through each release, completing database migrations and a limited set of other
tasks.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="caveats"&gt;
&lt;h3&gt;Caveats&lt;/h3&gt;
&lt;p&gt;Before outlining the suggested changes to TripleO it is worth highlighting the
following caveats for fast-forward upgrades:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The control plane is inaccessible for the duration of the upgrade&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The data plane and active workloads must remain available for the duration of
the upgrade.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="prerequisites"&gt;
&lt;h3&gt;Prerequisites&lt;/h3&gt;
&lt;p&gt;Prior to the overcloud fast-forward upgrade starting the following prerequisite
tasks must be completed:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Rolling minor update of the overcloud on &lt;cite&gt;N&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a normal TripleO overcloud update &lt;a class="footnote-reference brackets" href="#id14" id="id4" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;4&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and should bring each node in
the environment up to the latest supported version of the underlying OS and
pulling in the latest packages. Operators can then reboot the nodes as
required. The reboot ensuring that the latest kernel, openvswitch, QEMU and any
other reboot dependant package is reloaded before proceeding with the upgrade.
This can happen well in advance of the overcloud fast-forward upgrade and
should remove the need for additional reboots during the upgrade.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Upgrade undercloud from &lt;cite&gt;N&lt;/cite&gt; to &lt;cite&gt;N+3&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The undercloud also needs to be upgraded to &lt;cite&gt;N+3&lt;/cite&gt; ahead of any overcloud
upgrade. Again this can happen well in advance of the overcloud upgrade. For
the time being this is a traditional, linear upgrade between &lt;cite&gt;N&lt;/cite&gt; and &lt;cite&gt;N+1&lt;/cite&gt;
releases until we reach the target &lt;cite&gt;N+3&lt;/cite&gt; Queens release.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Container images cached prior to the start of the upgrade&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With the introduction of containerised TripleO overclouds in Pike operators
will need to cache the required container images prior to the fast-forward
upgrade if they wish to end up with a containerised Queens overcloud.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="high-level-flow"&gt;
&lt;h3&gt;High level flow&lt;/h3&gt;
&lt;p&gt;At a high level the following actions will be carried out by the fast-forward
upgrade to move the overcloud from &lt;cite&gt;N&lt;/cite&gt; to &lt;cite&gt;N+3&lt;/cite&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Stop all OpenStack control and compute services across all roles&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This will bring down the OpenStack control plane, leaving infrastructure
services such as the databases running, while allowing any workloads to
continue running without interruption. For HA environments this will disable
the cluster, ensuring that OpenStack services are not restarted.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Upgrade a single host from &lt;cite&gt;N&lt;/cite&gt; to &lt;cite&gt;N+1&lt;/cite&gt; then &lt;cite&gt;N+1&lt;/cite&gt; to &lt;cite&gt;N+2&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As alluded to earlier, OpenStack projects currently only support &lt;cite&gt;N&lt;/cite&gt; to &lt;cite&gt;N+1&lt;/cite&gt;
upgrades and so fast-forward upgrades still need to cycle through each release in
order to complete data migrations and any other tasks that are required before
these migrations can be completed. This part of the upgrade is limited to a
single host per role to ensure this is completed as quickly as possible.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Optional upgrade and deployment of single canary compute host to &lt;cite&gt;N+3&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As fast-forward upgrades aim to ensure workloads are online and accessible
during the upgrade we can optionally upgrade all control service hosting roles
_and_ a single canary compute to &lt;cite&gt;N+3&lt;/cite&gt; to verify that workloads will remain
active and accessible during the upgrade.&lt;/p&gt;
&lt;p&gt;A canary compute node will be selected at the start of the upgrade and have
instances launched on it to validate that both it and the data plane remain
active during the upgrade. The upgrade will halt if either become inaccessible
with a recovery procedure being provided to move all hosts back to &lt;cite&gt;N+1&lt;/cite&gt;
without further disruption to the active workloads on the untouched compute
hosts.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Upgrade and deployment of all roles to &lt;cite&gt;N+3&lt;/cite&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the above optional canary compute host upgrade is not used then the final
action in the fast-forward upgrade will be a traditional &lt;cite&gt;N&lt;/cite&gt; to &lt;cite&gt;N+1&lt;/cite&gt; migration
between &lt;cite&gt;N+2&lt;/cite&gt; and &lt;cite&gt;N+3&lt;/cite&gt; followed by the deployment of all roles on &lt;cite&gt;N+3&lt;/cite&gt;. This
final action essentially being a redeployment of the overcloud to containers on
&lt;cite&gt;N+3&lt;/cite&gt; (Queens) as previously seen when upgrading TripleO environments from
Ocata to Pike.&lt;/p&gt;
&lt;p&gt;A python-tripleoclient command and associated Mistral workflow will control if
this final step is applied to all roles in parallel (default), all hosts in a
given role or selected hosts in a given role. The latter being useful if a user
wants to control the order in which computes are moved from &lt;cite&gt;N+1&lt;/cite&gt; to &lt;cite&gt;N+3&lt;/cite&gt; etc.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h3&gt;Implementation&lt;/h3&gt;
&lt;p&gt;As with updates &lt;a class="footnote-reference brackets" href="#id15" id="id5" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;5&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and upgrades &lt;a class="footnote-reference brackets" href="#id16" id="id6" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;6&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; specific fast-forward upgrade Ansible
tasks associated with the first two actions above will be introduced into the
&lt;cite&gt;tripleo-heat-template&lt;/cite&gt; service templates for each service as &lt;cite&gt;RoleConfig&lt;/cite&gt;
outputs.&lt;/p&gt;
&lt;p&gt;As with &lt;cite&gt;upgrade_tasks&lt;/cite&gt; each task is associated with a particular step in the
process. For &lt;cite&gt;fast_forward_upgrade_tasks&lt;/cite&gt; these steps are split between prep
tasks that apply to all hosts and bootstrap tasks that only apply to a single
host for a given role.&lt;/p&gt;
&lt;p&gt;Prep step tasks will map to the following actions:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Step=1: Disable the overall cluster&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step=2: Stop OpenStack services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step=3: Update host repositories&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Bootstrap step tasks will map to the following actions:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Step=4: Take OpenStack DB backups&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step=5: Pre package update commands&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step=6: Update required packages&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step=7: Post package update commands&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step=8: OpenStack service DB sync&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step=9: Validation&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As with &lt;cite&gt;update_tasks&lt;/cite&gt; each task will use simple &lt;cite&gt;when&lt;/cite&gt; conditionals to
identify which step and release(s) it is associated with, ensuring these tasks
are executed at the correct point in the upgrade.&lt;/p&gt;
&lt;p&gt;For example, a step 2 &lt;cite&gt;fast_forward_upgrade_task&lt;/cite&gt; task on Ocata is listed below:&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;fast_forward_upgrade_tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Example Ocata step 2 task&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/bin/foo bar&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;when&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;step|int == 2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;release == 'ocata'&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;These tasks will then be collated into role specific Ansible playbooks via the
RoleConfig output of the &lt;cite&gt;overcloud&lt;/cite&gt; heat template, with step and release
variables being fed in to ensure tasks are executed in the correct order.&lt;/p&gt;
&lt;p&gt;As with &lt;cite&gt;major upgrades&lt;/cite&gt; &lt;a class="footnote-reference brackets" href="#id18" id="id7" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;8&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; a new mistral workflow and tripleoclient command
will be introduced to generate and execute the associated Ansible tasks.&lt;/p&gt;
&lt;div class="highlight-bash notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;openstack overcloud fast-forward-upgrade --templates &lt;span class="o"&gt;[&lt;/span&gt;..path to latest THT..&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                           &lt;span class="o"&gt;[&lt;/span&gt;..original environment arguments..&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                           &lt;span class="o"&gt;[&lt;/span&gt;..new container environment agruments..&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Operators will also be able to generate &lt;a class="footnote-reference brackets" href="#id17" id="id8" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;7&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; , download and review the
playbooks ahead of time using the latest version of &lt;cite&gt;tripleo-heat-templates&lt;/cite&gt;
with the following commands:&lt;/p&gt;
&lt;div class="highlight-bash notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;openstack overcloud deploy --templates &lt;span class="o"&gt;[&lt;/span&gt;..path to latest THT..&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                           &lt;span class="o"&gt;[&lt;/span&gt;..original environment arguments..&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                           &lt;span class="o"&gt;[&lt;/span&gt;..new container environment agruments..&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                           -e environments/fast-forward-upgrade.yaml &lt;span class="se"&gt;\&lt;/span&gt;
                           -e environments/noop-deploy-steps.yaml
openstack overcloud config download
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="dev-workflow"&gt;
&lt;h3&gt;Dev workflow&lt;/h3&gt;
&lt;p&gt;The existing tripleo-upgrade Ansible role will be used to automate the
fast-forward upgrade process for use by developers and CI, including the
initial overcloud minor update, undercloud upgrade to &lt;cite&gt;N+3&lt;/cite&gt; and fast-forward
upgrade itself.&lt;/p&gt;
&lt;p&gt;Developers working on fast_forward_upgrade_tasks will also be able to deploy
minimal overcloud deployments via &lt;cite&gt;tripleo-quickstart&lt;/cite&gt; using release configs
also used by CI.&lt;/p&gt;
&lt;p&gt;Further, when developing tasks, developers will be able to manually render and
run &lt;cite&gt;fast_forward_upgrade_tasks&lt;/cite&gt; as standalone Ansible playbooks. Allowing them
to run a subset of the tasks against specific nodes using
&lt;cite&gt;tripleo-ansible-inventory&lt;/cite&gt;. Examples of how to do this will be documented
hopefully ensuring a smooth development experience for anyone looking to
contribute tasks for specific services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Continue to force operators to upgrade linearly through each major release&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parallel cloud migrations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The control plane will be down for the duration of the upgrade&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The data plane and workloads will remain up.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Third party service template providers will need to provide
fast_forward_upgrade_steps in their THT service configurations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="id9"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;lbezdick&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;marios&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;chem&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other contributors:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;shardy&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;lyarwood&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Introduce fast_forward_upgrades_playbook.yaml to RoleConfig&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introduce fast_forward_upgrade_tasks in each service template&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introduce a python-tripleoclient command and associated Mistral workflow.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;TripleO - Ansible upgrade Workflow with UI integration &lt;a class="footnote-reference brackets" href="#id19" id="id10" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;9&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The new major upgrade workflow being introduced for Pike to Queens upgrades
will obviously impact what fast-forward upgrades looks like to Queens. At
present the high level flow for fast-forward upgrades assumes that we can reuse
the current &lt;cite&gt;upgrade_tasks&lt;/cite&gt; between N+2 and N+3 to disable and then potentially
remove baremetal services. This is likely to change as the major upgrade
workflow is introduced and so it is likely that these steps will need to be
encoded in &lt;cite&gt;fast_forward_upgrade_tasks&lt;/cite&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Third party CI jobs will need to be created to test Newton to Queens using
RDO given the upstream EOL of stable/newton with the release of Pike.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;These jobs should cover the initial undercloud upgrade, overcloud upgrade and
optional canary compute node checks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An additional third party CI job will be required to verify that a Queens
undercloud can correctly manage a Newton overcloud, allowing the separation
of the undercloud upgrade and fast-forward upgrade discussed under
prerequisites.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, minimal overcloud roles should be used to verify the upgrade for
certain services. For example, when changes are made to the
&lt;cite&gt;fast_forward_upgrade_tasks&lt;/cite&gt; of Nova via changes to
&lt;cite&gt;docker/services/nova-*.yaml&lt;/cite&gt; files then a basic overcloud deployment of
Keystone, Glance, Swift, Cinder, Neutron and Nova could be used to quickly
verify the changes in regards to fast-forward upgrades.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This will require extensive developer and user documentation to be written,
most likely in a new section of the docs specifically detailing the
fast-forward upgrade flow.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id11" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/MEX-ops-migrations-upgrades"&gt;https://etherpad.openstack.org/p/MEX-ops-migrations-upgrades&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id12" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/BOS-forum-skip-level-upgrading"&gt;https://etherpad.openstack.org/p/BOS-forum-skip-level-upgrading&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id13" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html"&gt;https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id14" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id4"&gt;4&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://tripleo.org/install/post_deployment/package_update.html"&gt;http://tripleo.org/install/post_deployment/package_update.html&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id15" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id5"&gt;5&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/README.rst#update-steps"&gt;https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/README.rst#update-steps&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id16" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id6"&gt;6&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/README.rst#upgrade-steps"&gt;https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/README.rst#upgrade-steps&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id17" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id8"&gt;7&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/495658/"&gt;https://review.openstack.org/#/c/495658/&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id18" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id7"&gt;8&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/q/topic:major-upgrade+(status:open+OR+status:merged"&gt;https://review.openstack.org/#/q/topic:major-upgrade+(status:open+OR+status:merged&lt;/a&gt;)&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id19" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id10"&gt;9&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/queens/tripleo_ansible_upgrades_workflow.html"&gt;https://specs.openstack.org/openstack/tripleo-specs/specs/queens/tripleo_ansible_upgrades_workflow.html&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
&lt;/section&gt;
</description><pubDate>Fri, 10 Apr 2020 00:00:00 </pubDate></item><item><title>Replace Mistral with Ansible</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ussuri/mistral-to-ansible.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-mistral-to-ansible"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-mistral-to-ansible&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The goal of this proposal is to replace Mistral in TripleO with Ansible
playbooks.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Mistral was originally added to take the place of an “API” and provide common
logic for tripleoclient and TripleO UI. After the TripleO UI was removed, the
only consumer of Mistral is tripleoclient. This means that Mistral now adds
unnecessary overhead and complexity.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Remove Mistral from the TripleO undercloud and convert all Mistral workbooks,
workflows and actions to Ansible playbooks within tripleo-ansible. tripleoclient
will then be updated to execute the Ansible playbooks rather than the Mistral
workflows.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The only other alternative candidate is to keep using Mistral and accept the
complexity and reinvest in the project.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;As the code will be re-writing Mistral workflows that currently deal with
passwords, tokens and secrets we will need to be careful. However the logic
should be largely the same.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With the eventual removal of Mistral and Zaqar two complex systems can be
removed which will reduce the surface area for security issues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The new Ansible playbooks will only use the undercloud OpenStack APIs,
therefore they shouldn’t create a new attack vector.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Upgrades will need to remove Mistral services and make sure the Ansible
playbooks are in place.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Older versions of tripleoclient will no longer work with the undercloud as
they will expect Mistral to be present.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Most of the data in Mistral is ephemeral, but some longer term data is stored
in Mistral environments. This data will likely be moved to Swift.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The output of CLI commands will change format. For example, the Mistral
workflow ID will no longer be included and other Ansible specific output will
be included. Where possible we will favour streaming Ansible output to the
user, making tripleoclient very light and transparent.&lt;/p&gt;
&lt;p&gt;Some CLI commands, such as introspection will need to fundamentally change
their output. Currently they send real time updates and progress to the client
with Zaqar. Despite moving the execution locally, we are unable to easily get
messages from a Ansible playbook while it is running. This means the user may
need to wait a long time before they get any feedback.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;There is no expected performance impact as the internal logic should be largely
the same. However, the Ansible playbooks will be executed where the user runs
the CLI rather than by the Mistral server. This could then be slower or faster
depending on the resources available to the machine and the network connection
to the undercloud.&lt;/p&gt;
&lt;p&gt;The undercloud itself should have more resources available since it wont be
running Mistral or Zaqar.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;If anyone is using the Mistral workflows directly, they will stop working. We
currently don’t know of any users doing this and it was never documented.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers will need to contribute to Ansible playbooks instead of Mistral
workflows. As the pool of developers that know Ansible is larger than those
that know Mistral this should make development easier. Ansible contributions
will likely expect unit/functional tests.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;d0ugal&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Other contributors:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;apetrich&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ekultails&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;sshnaidm&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cloudnull&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Storyboard is being used to track this work:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&lt;a class="reference external" href="https://storyboard.openstack.org/#!/board/208"&gt;https://storyboard.openstack.org/#!/board/208&lt;/a&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Migrate the Mistral workflows to Ansible playbooks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migrate or replace custom Mistral actions to Ansible native components.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove Mistral and Zaqar.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update documentation specific to Mistral.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extend our auto-documentation plugin to support playbooks within
tripleo-ansible. This will allow us to generate API documentation for all
playbooks committed to tripleo-ansible, which will include our new &lt;cite&gt;cli&lt;/cite&gt;
prefixed playbooks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="converting-mistral-workflows-to-ansible"&gt;
&lt;h4&gt;Converting Mistral Workflows to Ansible&lt;/h4&gt;
&lt;p&gt;For each Mistral workflow the following steps need to be taken to port them
to Ansible.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Re-write the Mistral workflow logic in Ansible, reusing the Mistral Python
actions where appropriate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update python-tripleoclient to use the new Ansible playbooks. It should
prefer showing the native Ansible output rather than attempting to replicate
the previous output.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Workflows and related code should be deleted from tripleo-common.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A complete example can be seen for the &lt;cite&gt;openstack undercloud backup&lt;/cite&gt; command.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/playbooks/cli-undercloud-backup.yaml"&gt;Ansible Playbook&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/#/c/665690/"&gt;Updated tripleoclient&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/#/c/703966/"&gt;Removal of all workflow code&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Since this change will largely be a re-working of existing code the changes
will be tested by the existing CI coverage. This should be improved and
expanded as is needed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Any references to Mistral will need to be updated to point to the new ansible
playbook.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.opendev.org/#/q/topic:mistral-removal+OR+topic:mistral_to_ansible"&gt;https://review.opendev.org/#/q/topic:mistral-removal+OR+topic:mistral_to_ansible&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bugs?field.tag=mistral-removal"&gt;https://bugs.launchpad.net/tripleo/+bugs?field.tag=mistral-removal&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010384.html"&gt;http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010384.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://storyboard.openstack.org/#!/board/208"&gt;https://storyboard.openstack.org/#!/board/208&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Thu, 23 Jan 2020 00:00:00 </pubDate></item><item><title>Bug tags</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/policy/bug-tagging.html</link><description>
 
&lt;p&gt;The main TripleO bug tracker is used to keep track of bugs for multiple
projects that are all parts of TripleO. In order to reduce confusion,
we are using a list of approved tags to categorize them.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Given the heavily interconnected nature of the various TripleO
projects, there is a desire to track all the related bugs in a single
bug tracker. However when it is needed, it can be difficult to narrow
down the bugs related to a specific aspect of the project. Launchpad
bug tags can help us here.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="policy"&gt;
&lt;h2&gt;Policy&lt;/h2&gt;
&lt;p&gt;The Launchpad official tags list for TripleO contains the following
tags. Keeping them official in Launchpad means the tags will
auto-complete when users start writing them. A bug report can have any
combination of these tags, or none.&lt;/p&gt;
&lt;p&gt;Proposing new tags should be done via policy update (proposing a change
to this file). Once such a change is merged, a member of the driver
team will create/delete the tag in Launchpad.&lt;/p&gt;
&lt;section id="tags"&gt;
&lt;h3&gt;Tags&lt;/h3&gt;
&lt;table class="docutils align-default"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Tag&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;alert&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;For critical bugs requiring immediate attention. Triggers IRC notification&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;ci&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting the Continuous Integration system&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;ci-reproducer&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting local recreation of Continuous Integration environments&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;config-agent&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting os-collect-config, os-refresh-config, os-apply-config&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;containers&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting container based deployments&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;depcheck&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting 3rd party dependencies, for example ceph-ansible, podman&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;deployment-time&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting deployment time&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;documentation&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug that is specific to documentation issues&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;edge&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug that correlates to EDGE computing cases by network/scale etc. areas&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;i18n&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug related to internationalization issues&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;low-hanging-fruit&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A good starter bug for newcomers&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;networking&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug that is specific to networking issues&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;promotion-blocker&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Bug that is blocking promotion job(s)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;puppet&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting the TripleO Puppet templates&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;quickstart&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting tripleo-quickstart or tripleo-quickstart-extras&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;selinux&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug related to SELinux&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tech-debt&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug related to TripleO tech debt&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tempest&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug related to tempest running on TripleO&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tripleo-common&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting tripleo-common&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;tripleo-heat-templates&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting the TripleO Heat Templates&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;tripleoclient&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting python-tripleoclient&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;ui&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting the TripleO UI&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;upgrade&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting upgrades&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;ux&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting user experience&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;validations&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting the Validations&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;workflows&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;A bug affecting the Mistral workflows&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;xxx-backport-potential&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Cherry-pick request for the stable team&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives-history"&gt;
&lt;h2&gt;Alternatives &amp;amp; History&lt;/h2&gt;
&lt;p&gt;The current ad-hoc system is not working well, as people use
inconsistent subject tags and other markers. Likewise, with the list
not being official Launchpad tags do not autocomplete and quickly
become inconsistent, hence not as useful.&lt;/p&gt;
&lt;p&gt;We could use the wiki to keep track of the tags, but the future of the
wiki is in doubt. By making tags an official policy, changes to the
list can be reviewed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="author-s"&gt;
&lt;h3&gt;Author(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary author:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jpichon&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="milestones"&gt;
&lt;h3&gt;Milestones&lt;/h3&gt;
&lt;p&gt;Newton-3&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Once the policy has merged, someone with the appropriate Launchpad
permissions should create the tags and an email should be sent to
openstack-dev referring to this policy.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Launchpad page to manage the tag list:
&lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+manage-official-tags"&gt;https://bugs.launchpad.net/tripleo/+manage-official-tags&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thread that led to the creation of this policy:
&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-July/099444.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2016-July/099444.html&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="revision-history"&gt;
&lt;h2&gt;Revision History&lt;/h2&gt;
&lt;table class="docutils align-default" id="id1"&gt;
&lt;caption&gt;&lt;span class="caption-text"&gt;Revisions&lt;/span&gt;&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Release Name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;Newton&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Introduced&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;Queens&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;tech-debt tag added&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0
Unported License.
&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Thu, 09 Jan 2020 00:00:00 </pubDate></item><item><title>tripleo-operator-ansible - Ansible roles and modules to interact with TripleO</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ussuri/tripleo-operator-ansible.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As an operator of a TripleO deployment, I would like to be able to comsume
supported ansible roles and modules that let me perform TripleO related
actions in my automation.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The existing &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-ansible"&gt;tripleo-ansible&lt;/a&gt; repository currently contains roles, plugins
and modules that are consumed by TripleO to perform the actual deployments and
configurations. As these are internal implementations to TripleO, we would not
want operators consuming these directly. The &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-ansible"&gt;tripleo-ansible&lt;/a&gt; repository is
also branched which means that the contents within the repo and packaging
are specific to a singular release. This spec propose that we create a new
repository targeted for external automation for any supported version.&lt;/p&gt;
&lt;p&gt;Currently Operators do not have a set of official ansible roles and modules
that can be used to deploy and manage TripleO environments. For folks who wish
to manage their TripleO environments in an automated fashion, we have seen
multiple folks implement the same roles to manage TripleO. e.g.
&lt;a class="reference external" href="https://opendev.org/openstack/tripleo-quickstart"&gt;tripleo-quickstart&lt;/a&gt;, &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-quickstart-extras"&gt;tripleo-quickstart-extras&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/redhat-openstack/infrared"&gt;infrared&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/cjeanner/tripleo-lab"&gt;tripleo-lab&lt;/a&gt;.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;TripleO should provide a set of ansible roles and modules that can be used
by the end user to deploy and manage an Undercloud and Overcloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TripleO should provide a set of ansible roles and modules that can be used
to perform scaling actions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TripleO should provide a set of ansible roles and modules that can be used
to perform update and upgrade actions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;TripleO should create a new repository where ansible roles, plugins and
modules that wrap TripleO actions can be stored. This repository should be
branchless so that the roles can be used with any currently supported version
of TripleO. The goal is to only provide automation for TripleO actions and not
necessarily other cloud related actions. The roles in this new repository
should only be targeted to providing an automation interface for the existing
&lt;a class="reference external" href="https://docs.openstack.org/python-tripleoclient/latest/index.html"&gt;tripleoclient commands&lt;/a&gt;. The repository may provide basic setups actions such
as implementing a wrapper around &lt;a class="reference external" href="https://opendev.org/openstack/tripleo-repos"&gt;tripleo-repos&lt;/a&gt;. The roles contained in this
repository should not implement additional day 2 cloud related operations such
as creating servers, networks or other resources on the deployed Overcloud.&lt;/p&gt;
&lt;p&gt;This new repository should be able to be packaged and distributed via an RPM
as well as being able to be published to &lt;a class="reference external" href="https://galaxy.ansible.com/"&gt;Ansible Galaxy&lt;/a&gt;. The structure
of this new repository should be Ansible &lt;a class="reference external" href="https://docs.ansible.com/ansible/latest/dev_guide/developing_collections.html"&gt;collections&lt;/a&gt; compatible.&lt;/p&gt;
&lt;p&gt;The target audience of the new repository would be end users (operators,
developers, etc) who want to write automation around TripleO. The new
repository and roles would be our officially supported automation artifacts.
One way to describe this would be like providing Puppet modules for a given
peice of software so that it can be consumed by users who use Puppet.  The
existing CLI will continue to function for users who do not want to use
Ansible to automate TripleO deployments or who wish to continue to use the CLI
by hand.  The roles are not a replacement for the CLI, but only provide an
official set of roles for people who use Ansible.&lt;/p&gt;
&lt;p&gt;The integration point for Ansible users would be the roles provided via
tripleo-operator-ansible.  We would expect users to perform actions by
including our provided roles.&lt;/p&gt;
&lt;p&gt;An example playbook for a user could be:&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hosts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;undercloud&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;gather_facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;include_role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tripleo_undercloud&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;tasks_from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;install&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;vars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;tripleo_undercloud_configuration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="nt"&gt;DEFAULT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;                 &lt;/span&gt;&lt;span class="nt"&gt;undercloud_debug&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;True&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;                 &lt;/span&gt;&lt;span class="nt"&gt;local_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;192.168.50.1/24&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Copy nodes.json&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;src&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/home/myuser/my-environment-nodes.json&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/home/stack/nodes.json&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;include_role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tripleo_baremetal&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;tasks_from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;introspection&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;vars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;tripleo_baremetal_nodes_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/home/stack/nodes.json&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;tripleo_baremetal_introspection_provide&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;True&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;tripleo_baremetal_introspection_all_managable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;True&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;include_role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;tripleo_overcloud&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;tasks_from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;deploy&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;vars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;tripleo_overcloud_environment_files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;network_isolation.yaml&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ceph_storage.yaml&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;tripleo_overcloud_roles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Controller&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Networker&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;CephStorage&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The internals of these roles could possibly proceed in two different paths:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Implement simple wrappers around the invocation of the actual TripleO
commands using execs, shell or commands. This path will likely be the fastest
path to have an initial implementation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Install undercloud&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"openstack&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;undercloud&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;install&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tripleo_undercloud_install_options&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;chdir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;"{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tripleo_undercloud_install_directory&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Implement a python wrapper to call into the provided tripleoclient classes.
This path may be a longer term goal as we may be able to provide better
testing by using modules.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="ch"&gt;#!/usr/bin/python&lt;/span&gt;

&lt;span class="c1"&gt;# import the python-tripleoclient&lt;/span&gt;
&lt;span class="c1"&gt;# undercloud cli&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tripleoclient.v1&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;undercloud&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;shlex&lt;/span&gt;

&lt;span class="c1"&gt;# See the following for details&lt;/span&gt;
&lt;span class="c1"&gt;# https://opendev.org/openstack/python-tripleoclient/src/branch/&lt;/span&gt;
&lt;span class="c1"&gt;# master/tripleoclient/v1/undercloud.py&lt;/span&gt;

&lt;span class="c1"&gt;# setup the osc command&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Arg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;verbose_level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;


&lt;span class="c1"&gt;# instantiate the&lt;/span&gt;
&lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;undercloud&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;InstallUndercloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tripleo'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Arg&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# prog_name = 'openstack undercloud install'&lt;/span&gt;
&lt;span class="n"&gt;tripleo_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'openstack undercloud install'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# read the argument string from the arguments file&lt;/span&gt;
&lt;span class="n"&gt;args_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;args_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# For this module, we're going to do key=value style arguments.&lt;/span&gt;
&lt;span class="n"&gt;arguments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shlex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="c1"&gt;# ignore any arguments without an equals in it&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="s2"&gt;"="&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"="&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# if setting the time, the key 'time'&lt;/span&gt;
        &lt;span class="c1"&gt;# will contain the value we want to set the time to&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"dry_run"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"True"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;tripleo_args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dry_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;tripleo_args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dry_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;

        &lt;span class="n"&gt;tripleo_args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;force_stack_validations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;
        &lt;span class="n"&gt;tripleo_args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;no_validations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;
        &lt;span class="n"&gt;tripleo_args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;force_stack_update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;
        &lt;span class="n"&gt;tripleo_args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inflight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;

        &lt;span class="c1"&gt;# execute the install via python-tripleoclient&lt;/span&gt;
        &lt;span class="n"&gt;rc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;take_action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tripleo_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rc&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="s2"&gt;"failed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s2"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"failed tripleo undercloud install"&lt;/span&gt;
            &lt;span class="p"&gt;}))&lt;/span&gt;
            &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="s2"&gt;"changed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"SUCCESS"&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Install undercloud&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;tripleo_undercloud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;install&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;bar&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;These implementations will need to be evaluated to understand which works
best when attempting to support multiple versions of TripleO where options
may or may not be available. The example of this is where we supported one
cli parameter in versions &amp;gt;= Stein but not prior to this.&lt;/p&gt;
&lt;p&gt;The goal is to have a complete set of roles to do basic deployments within
a single cycle. We should be able to itterate on the internals of the roles
once we have established basic set to prove out the concept. More complex
actions or other version support may follow on in later cycles.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Do nothing and continue to have multiple tools re-implement the actions in
ansible roles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pick a singular implementaion from the existing set and merge them together
within this existing tool. This however may include additional actions that
are outside of the scope of the TripleO management.  This may also limit the
integration by others if established interfaces are too opinionated.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;There should be no upgrade impact other than pulling in the upgrade related
actions into this repository.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers will need to ensure the supported roles are updated if the cli
or other actions are updated with new options or patterns.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;mwhahaha&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;weshay
emilienm
cloudnull&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;The existing roles should be evaulated to see if they can be reused and pulled
into the new repository.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create new tripleo-operator-ansible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Establish CI and testing framework for the new repository&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Evaulate and pull in existing roles if possible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Initial implementation may only be a basic wrapper over the cli&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update tripleo-quickstart to leverage the newly provided roles and remove
previously roles.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;If there are OpenStack service related actions that need to occur, we may need
to investigate the inclusion of OpenStackSDK, shade or other upstream related
tools.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The new repository should have molecule testing for any new role created.
Additionally once tripleo-quickstart begins to consume the roles we will need
to ensure that other deployment related CI jobs are included in the testing
matrix.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The roles should be documented (perferrably automated) for the operators to
be able to consume these new roles.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 12 Nov 2019 00:00:00 </pubDate></item><item><title>Scaling with the Ansible Inventory</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ussuri/scaling-with-ansible-inventory.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/scaling-with-Ansible-inventory"&gt;https://blueprints.launchpad.net/tripleo/scaling-with-Ansible-inventory&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Scaling an existing deployment should be possible by adding new host
definitions directly to the Ansible inventory, and not having to increase the
&amp;lt;Role&amp;gt;Count parameters.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently to scale a deployment, a Heat stack update is required. The stack
update reflects the new desired node count of each role, which is then
represented in the generated Ansible inventory. The inventory file is then used
by the config-download process when ansible-playbook is executed to perform the
software configuration on each node.&lt;/p&gt;
&lt;p&gt;Updating the Heat stack with the new desired node count has posed some
scaling challenges. Heat creates a set of resources associated with each node.
As the number of nodes in a deployment increases, Heat has more and more
resources to manage.&lt;/p&gt;
&lt;p&gt;As the stack size grows, Heat must be tuned with software configurations or
horizontally scaled with additional engine workers. However, horizontal scaling
of Heat workers will only help so much as eventually other service workers
would need to be scaled as well, such as database, messaging, or Keystone
worker process. Having to increasingly scale worker processes results in
additional physical resource consumption.&lt;/p&gt;
&lt;p&gt;Heat performance also begins to degrade as stack size increases. It takes
longer and longer for stack operations to complete as node count increases. The
stack operation time often reaches into taking many hours, which is usually
outside the range of typical maintenance windows.&lt;/p&gt;
&lt;p&gt;It is also hard to predict what changes Heat will make. Often, no changes are
desired other than to scale out to new nodes. However, unintended template
changes or user error around forgetting to pass environment files poses
additional unnecessary risk to the scaling operation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The proposed change would allow for users to directly add new node definitions
to the Ansible inventory by way of a new Heat parameter to allow for scaling
services onto those new nodes. No change in the &amp;lt;Role&amp;gt;Count parameters would be
required.&lt;/p&gt;
&lt;p&gt;A minimum set of data would be required when adding a new node to the Ansible
inventory. Presently, this includes the TripleO role, and an IP address on each
network that is used by that role.&lt;/p&gt;
&lt;p&gt;Only scaling of already defined roles will be possible with this method.
Defining new roles would still require a full Heat stack update which defined
the new role.&lt;/p&gt;
&lt;p&gt;Once the new node(s) are added to the inventory, ansible-playbook could be
rerun with the config-download directory to scale the software services out
on to the new nodes.&lt;/p&gt;
&lt;p&gt;As increasing the node count in the Heat stack operation won’t be necessary
when scaling, if baremetal provisioning is required for the new nodes, then
this work depends on the nova-less-deploy work:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html"&gt;https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Once baremetal provisioning is migrated out of Heat with the above work, then
new nodes can be provisioned with those new workflows before adding them
directly to the Ansible inventory.&lt;/p&gt;
&lt;p&gt;Since new nodes added directly to the Ansible inventory would still be
consuming IP’s from the subnet ranges defined for the overcloud networks,
Neutron needs to be made aware of those assignments so that there are no
overlapping IP addresses. This could be done with a new interface in
tripleo-heat-templates that allows for specifying the extra node inventory
data. The parameter would be called &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt;. The templates would
take care of operating on that input and creating the appropriate Neutron ports
to correspond to the IP addresses specified in the data.&lt;/p&gt;
&lt;p&gt;When tripleo-ansible-inventory is used to generate the inventory, it would
query Heat as it does today, but also layer in the extra inventory data as
specified by &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt;. The resulting inventory would be a unified
view of all nodes in the deployment.&lt;/p&gt;
&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt; may be a list of files that are consumed with Heat’s
get_file function so that the deployer can keep their inventory data organized
by file.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;This change is primarily targeted at addressing scaling issues around the
Heat stack operation. Alternative methods include using undercloud minions:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/undercloud_minion.html"&gt;https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/undercloud_minion.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Multi-stack/split-controlplane also addresses the issue somewhat by breaking up
the deployment into smaller and more manageable stacks:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_compute_node.html"&gt;https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_compute_node.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;These alternatives are complimentary to the proposed solution here, and all of
these solutions can be used together for the greatest benefits.&lt;/p&gt;
&lt;section id="direct-manipulation-of-inventory-data"&gt;
&lt;h4&gt;Direct manipulation of inventory data&lt;/h4&gt;
&lt;p&gt;Another alternative would be to not make use of any new interface in the
templates such as the previously mentioned &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt;. Users could just
update the inventory file manually, or drop inventory files in a specified
location (since Ansible can use a directory as an inventory source).&lt;/p&gt;
&lt;p&gt;The drawbacks to this approach are that another tool would be necessary to
create associated ports in Neutron so that there are no overlapping IP
addresses. It could also be a manual step, although that is prone to error.&lt;/p&gt;
&lt;p&gt;The advantages to this approach is that it would completely eliminate the stack
update operation as part of the scaling. Not having any stack operation is
appealing in some regards due to the potential to forget environment files or
other user error (out of date templates, etc).&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;IP addresses and hostnames would potentially exist in user managed templates
that have the value for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt;, however this is no different than
what is present today.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;The upgrade process will need to be aware that not all nodes are represented in
the Heat stack, and some will be represented only in the inventory. This should
not be an issue as long as there is a consistent interface to get a single
unified inventory as there exists now.&lt;/p&gt;
&lt;p&gt;Any changes around creating the unified view of the inventory should be made
within the implementation of that interface (tripleo-ansible-inventory) such
that existing tooling continues to use an inventory that contains all nodes for
a deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Users will potentially have to manage additional environment files for the
extra inventory data.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Performance should be improved during scale out operations.&lt;/p&gt;
&lt;p&gt;However, it should be noted that Ansible will face scaling challenges as well.
While this change does not directly introduce those new challenges, it may
expose them more rapidly as it bypasses the Heat scaling challenges.&lt;/p&gt;
&lt;p&gt;For example, it is not expected that simply adding hundreds or thousands of new
nodes directly to the Ansible inventory means that scaling operation would
succeed. It would likely expose new scaling challenges in other tooling, such
as the playbook and role tasks or Ansible itself.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Since this proposal is meant to align with the nova-less-deploy, all nodes
(whether they are known to Heat or not) would be unprovisioned if the
deployment is deleted.&lt;/p&gt;
&lt;p&gt;If using pre-provisioned nodes, then there is no change in behavior in that
deleting the Heat stack does not actually “undeploy” any software. This
proposal does not change that behavior.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers could more quickly test scaling by bypassing the Heat stack update
completely if desired, or using the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt; interface.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;James Slagle &amp;lt;&lt;a class="reference external" href="mailto:jslagle%40redhat.com"&gt;jslagle&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add new parameter &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add Heat processing of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;create Neutron ports&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;add stack outputs&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update tripleo-ansible-inventory to consume from added stack outputs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update HostsEntry to be generic&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Depends on nova-less-deploy work for baremetal provisioning outside of Heat.
If using pre-provisioned nodes, does not depend on nova-less-deploy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All deployment configurations coming out of Heat need to be generic per role.
Most of this work was complete in Train, however this should be reviewed. For
example, the HostsEntry data is still static and Heat is calculating the node
list. This data needs to be moved to an Ansible template.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Scaling is not currently tested in CI, however perhaps it could be with this
change.&lt;/p&gt;
&lt;p&gt;Manual test plans and other test automation would need to be updated to also
test scaling with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation needs to be added for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ExtraInventoryData&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The feature should also be fully explained in that users and deployers need to
be made aware of the change of how nodes may or may not be represented in the
Heat stack.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html"&gt;https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/undercloud_minion.html"&gt;https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/undercloud_minion.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_compute_node.html"&gt;https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_compute_node.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Wed, 16 Oct 2019 00:00:00 </pubDate></item><item><title>TripleO Squads</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/policy/squads.html</link><description>
 
&lt;p&gt;Scaling-up a team is a common challenge in OpenStack.
We always increase the number of projects, with more contributors
and it often implies some changes in the organization.
This policy is intended to document how we will address this challenge in
the TripleO project.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Projects usually start from a single git repository and very often grow to
dozen of repositories, doing different things.  As long as a project gets
some maturity, people who work together on a same topic needs some space
to collaborate the open way.
Currently, TripleO is acting as a single team where everyone meets
on IRC once a week to talk about bugs, CI status, release management.
Also, it happens very often that new contributors have hard time to find
an area of where they could quickly start to contribute.
Time is precious for our developers and we need to find a way to allow
them to keep all focus on their area of work.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="policy"&gt;
&lt;h2&gt;Policy&lt;/h2&gt;
&lt;p&gt;The idea of this policy is to create squads of people who work on the
same topic and allow them to keep focus with low amount of external
distractions.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Anyone would be free to join and leave a squad at will.
Right now, there is no size limit for a squad as this is something
experimental. If we realize a squad is too big (more than 10 people),
we might re-consider the focus of area of the squad.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Anyone can join one or multiple squads at the same time. Squads will be
documented in a place anyone can contribute.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Squads are free to organize themselves a weekly meeting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;#tripleo remains the official IRC channel.  We won’t add more channels.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Squads will have to choose a representative, who would be a squad liaison
with TripleO PTL.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TripleO weekly meeting will still exist, anyone is encouraged to join,
but topics would stay high level.  Some examples of topics: release
management; horizontal discussion between squads, CI status, etc.
The meeting would be a TripleO cross-projects meeting.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We might need to test the idea for at least 1 or 2 months and invest some
time to reflect what is working and what could be improved.&lt;/p&gt;
&lt;section id="benefits"&gt;
&lt;h3&gt;Benefits&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;More collaboration is expected between people working on a same topic.
It will reflect officially what we have nearly done over the last cycles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;People working on the same area of TripleO would have the possibility
to do public and open meetings, where anyone would be free to join.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Newcomers would more easily understand what TripleO project delivers
since squads would provide a good overview of the work we do.  Also
it would be an opportunity for people who want to learn on a specific
area of TripleO to join a new squad and learn from others.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open more possibilities like setting up mentoring program for each squad,
or specific docs to get involved more quickly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="challenges"&gt;
&lt;h3&gt;Challenges&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We need to avoid creating silos and keep horizontal collaboration.
Working on a squad doesn’t meen you need to ignore other squads.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="squads"&gt;
&lt;h3&gt;Squads&lt;/h3&gt;
&lt;p&gt;The list tends to be dynamic over the cycles, depending on which topics
the team is working on. The list below is subject to change as squads change.&lt;/p&gt;
&lt;table class="docutils align-default"&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Squad&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;ci&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Group of people focusing on Continuous Integration tooling and system&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;upgrade&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Group of people focusing on TripleO upgrades&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;validations&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Group of people focusing on TripleO validations tooling&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;networking&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Group of people focusing on networking bits in TripleO&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;integration&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Group of people focusing on configuration management (eg: services)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;security&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Group of people focusing on security&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;edge&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Group of people focusing on Edge/multi-site/multi-cloud
&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-edge-squad-status"&gt;https://etherpad.openstack.org/p/tripleo-edge-squad-status&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;transformation&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Group of people focusing on converting heat templates / puppet to Ansible
within the tripleo-ansible framework&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Note about CI: the squad is about working together on the tooling used
by OpenStack Infra to test TripleO, though every squad has in charge of
maintaining the good shape of their tests.&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives-history"&gt;
&lt;h2&gt;Alternatives &amp;amp; History&lt;/h2&gt;
&lt;p&gt;One alternative would be to continue that way and keep a single horizontal
team.  As long as we try to welcome in the team and add more projects, we’ll
increase the problem severity of scaling-up TripleO project.
The number of people involved and the variety of topics that makes it really difficult to become able to work on everything.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="author-s"&gt;
&lt;h3&gt;Author(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary author:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;emacchi&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="milestones"&gt;
&lt;h3&gt;Milestones&lt;/h3&gt;
&lt;p&gt;Ongoing&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Work with TripleO developers to document the area of work for every squad.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document the output.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document squads members.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Setup Squad meetings if needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For each squad, find a liaison or a squad leader.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0
Unported License.
&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
</description><pubDate>Mon, 03 Jun 2019 00:00:00 </pubDate></item><item><title>Scale Undercloud with a Minion</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/train/undercloud-minion.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/undercloud-minion"&gt;https://blueprints.launchpad.net/tripleo/undercloud-minion&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In order to improve our scale, we have identified heat-engine and possibly
ironic-conductor as services that we can add on to an existing undercloud
deployment.  Adding heat-engine allows for additional processing capacity
when creating and updating stacks for deployment.  By adding a new light
weight minion node, we can scale the Heat capacity horizontally.&lt;/p&gt;
&lt;p&gt;Additionally since these nodes could be more remote, we could add an
ironic-conductor instance to be able to manage hosts in a remote region
while still having a central undercloud for the main management.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently we use a single heat-engine on the undercloud for the deployment.
According to the Heat folks, it can be beneficial for processing to have
additional heat-engine instances for scale. The recommended scaling is out
rather than up.  Additionally by being able to deploy a secondary host, we
can increase our capacity for the undercloud when additional scale capacity
is required.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;We are proposing to add a new undercloud “minion” configuration that can be
used by operators to configure additional instances of heat-engine and
ironic-conductor when they need more processing capacity.  We would also
allow the operator to disable heat-engine from the main undercloud to reduce
the resource usage of the undercloud.  By removing the heat-engine from the
regular undercloud, the operator could possibly avoid timeouts on other services
like keystone and neutron that can occur when the system is under load.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;An alternative would be to make the undercloud deployable in a traditional
HA capacity where we share the services across multiple nodes. This would
increase the overall capacity but adds additional complexity to the undercloud.
Additionally this does not let us target specific services that are resource
heavy.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The new node would need to have access to the the main undercloud’s keystone,
database and messaging services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;The new minion role would need to be able to be upgraded by the user.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This additional minion role may improve heat processing due to the additional
resource capacity being provided.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Locating an ironic-conductor closer to the nodes being managed can improve
performance by being closer to the systems (less latency, etc).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Additional undercloud role and a new undercloud-minion.conf configuration file
will be created. Additionally a new option may be added to the undercloud.conf
to manage heat-engine instalation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;mwhahaha&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;slagle
EmilienM&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Work items or tasks – break the feature up into the things that need to be
done to implement it. Those parts might end up being done by different people,
but we’re mostly trying to understand the timeline for implementation.&lt;/p&gt;
&lt;section id="python-tripleoclient"&gt;
&lt;h4&gt;python-tripleoclient&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;New ‘openstack undercloud minion deploy’ command for installation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New ‘openstack undercloud minion upgrade’ command for upgrades&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New configuration file ‘undercloud-minion.conf’ to drive the installation
and upgrades.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New configuration option in ‘undercloud.conf’ to provide ability to disable
the heat-engine on the undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="tripleo-heat-templates"&gt;
&lt;h4&gt;tripleo-heat-templates&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;New ‘UndercloudMinion’ role file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New environment file for the undercloud minion deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Additional environment files to enable or disable heat-engine and
ironic-conductor.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We would add a new CI job to test the deployment of the minion node. This job
will likely be a new multinode job.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will need to document the usage of the undercloud minion installation and
the specific use cases where this can be beneficial.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;See the notes from the Train PTG around Scaling.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-ptg-train"&gt;https://etherpad.openstack.org/p/tripleo-ptg-train&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/DEN-tripleo-forum-scale"&gt;https://etherpad.openstack.org/p/DEN-tripleo-forum-scale&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Fri, 03 May 2019 00:00:00 </pubDate></item><item><title>Move certificate management in tripleo-heat-templates</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/train/certificate-management.html</link><description>
 
&lt;p&gt;Launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/ansible-certmonger"&gt;https://blueprints.launchpad.net/tripleo/+spec/ansible-certmonger&lt;/a&gt;&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;There are multiple issues with the current way certificates are managed with
Puppet and Certmonger, especially in a containerized environment:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Multiple containers are using the same certificate&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There isn’t any easy way to find out which container needs to be restarted
upon certificate renewal&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shared certificates are bad&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The main issue now is the use of “pkill”, especially for httpd services. Since
Certmonger has no knowledge of what container has an httpd service running,
it uses a wide fly swatter in the hope all related services will effectively
be reloaded with the new certificate.&lt;/p&gt;
&lt;p&gt;The usage of “pkill” by Certmonger is prevented on a SELinux enforcing host.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="introduction"&gt;
&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;While the use of certmonger isn’t in question, the way we’re using it is.&lt;/p&gt;
&lt;p&gt;The goal of this document is to describe how we could change that usage,
allowing to provide a better security, while allowing Certmonger to restart
only the needed containers in an easy fashion.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implement-certmonger-in-ansible"&gt;
&lt;h3&gt;Implement certmonger in Ansible&lt;/h3&gt;
&lt;p&gt;A first step will be to implement a certmonger “thing” in Ansible. There are
two ways to do that:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Reusable role&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Native Ansible module&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While the first one is faster to implement, the second would be better, since
it will allow to provide a clean way to manage the certificates.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="move-certificate-management-to-tripleo-heat-templates"&gt;
&lt;h3&gt;Move certificate management to tripleo-heat-templates&lt;/h3&gt;
&lt;p&gt;Once we have a way to manage Certmonger within Ansible, we will be able to move
calls directly in relevant tripleo-heat-templates files, allowing to generate
per-container certificate.&lt;/p&gt;
&lt;p&gt;Doing so will also allow Certmonger to know exactly which container to
restart upon certificate renewal, using a simple “container_cli kill” command.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h2&gt;Alternatives&lt;/h2&gt;
&lt;p&gt;One alternative is proposed&lt;/p&gt;
&lt;section id="maintain-a-list"&gt;
&lt;h3&gt;Maintain a list&lt;/h3&gt;
&lt;p&gt;We could maintain the code as-is, and just add a list for the containers
needing a restart/reload. Certmonger would loop on that list, and do its
job upon certificate renewal.&lt;/p&gt;
&lt;p&gt;This isn’t a good solution, since the list will eventually lack updates, and
this will create new issues instead of solving the current ones.&lt;/p&gt;
&lt;p&gt;Also, it doesn’t allow to get per-container certificate, which is bad.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="proposed-roadmap"&gt;
&lt;h2&gt;Proposed roadmap&lt;/h2&gt;
&lt;p&gt;In Stein:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create “tripleo-certmonger” Ansible reusable role in tripleo-common&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In Train:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Move certificate management/generation within tripleo-heat-templates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Evaluate the benefices of moving to a proper Ansible module for Certmonger.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If evaluation is good and we have time, implement it and update current code.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In “U” release:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Check if anything relies on puppet-certmonger, and if not, drop this module.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h2&gt;Security Impact&lt;/h2&gt;
&lt;p&gt;We will provide a better security level by avoiding shared x509 keypairs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h2&gt;Upgrade Impact&lt;/h2&gt;
&lt;p&gt;Every container using the shared certificate will be restarted in order to
load the new, dedicated one.&lt;/p&gt;
&lt;p&gt;We will have to ensure the nova metadata are properly updated in order to
let novajoin create the services in FreeIPA, allowing to request per-service
certificates.&lt;/p&gt;
&lt;p&gt;Tests should also be made regarding novajoin update/upgrade in order to ensure
all is working as expected.&lt;/p&gt;
&lt;p&gt;If the containers are already using dedicated certificates, no other impact is
expected.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="end-user-impact"&gt;
&lt;h2&gt;End User Impact&lt;/h2&gt;
&lt;p&gt;During the upgrade, a standard short downtime is to be expected, unless
the deployment is done using HA.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h2&gt;Performance Impact&lt;/h2&gt;
&lt;p&gt;No major performance impact is expected.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="deployer-impact"&gt;
&lt;h2&gt;Deployer Impact&lt;/h2&gt;
&lt;p&gt;No major deployer impact is expected.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h2&gt;Developer Impact&lt;/h2&gt;
&lt;p&gt;People adding new services requiring a certificate will need to call the
Certmonger module/role in the new tripleo-heat-templates file.&lt;/p&gt;
&lt;p&gt;They will also need to ensure new metadata is properly generated in order to
let novajoin create the related service in FreeIPA.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="contributors"&gt;
&lt;h3&gt;Contributors&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Cédric Jeanneret&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Grzegorz Grasza&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nathan Kinder&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Implement reusable role for Certmonger&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move certificate management to tripleo-heat-templates&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove certmonger parts from Puppet&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update/create needed documentations about the certificate management&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Later:
* Implement a proper Ansible Module
* Update the role in order to wrap module calls&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None - currently, no Certmonger module for Ansible exists.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We have to ensure the dedicated certificate is generated with the right
content, and ensure it’s served by the right container.&lt;/p&gt;
&lt;p&gt;We can do that using openssl CLI, maybe adding a new check in the CI via
a new role in tripleo-quickstart-extras.&lt;/p&gt;
&lt;p&gt;This is also deeply linked to novajoin, thus we have to ensure it works as
expected.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will need to document how the certificate are managed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/ansible/ansible/tree/devel/lib/ansible/modules/crypto"&gt;Example of existing certificate management in Ansible&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/nkinder/ansible/commit/c2f74d07e6b71055fad2207ed26ae82bb8beffc3"&gt;Skeleton certmonger_getcert&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-common/tree/master/roles"&gt;Existing reusable roles in TripleO&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Fri, 22 Mar 2019 00:00:00 </pubDate></item><item><title>Provide a common Validation Framework inside python-tripleoclient</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/validation-framework.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/validation-framework"&gt;https://blueprints.launchpad.net/tripleo/+spec/validation-framework&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Currently, we’re lacking a common validation framework in tripleoclient. This
framework should provide an easy way to validate environment prior deploy and
prior update/upgrade, on both undercloud and overcloud.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently, we have two types of validations:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Those launched prior the undercloud deploy, embedded into the deploy itself&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Those launched at will via a Mistral Workflow&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There isn’t any unified way to call any validations by itself in an easy way,
and we lack the capacity to easily add new validations for the undercloud
preflight checks.&lt;/p&gt;
&lt;p&gt;The current situation is not optimal, as the operator must go in the UI in order
to run validations - there is a way to run them from the CLI, using the exact
same workflows as the UI. This can’t be used in order to get proper preflight
validations, especially when we don’t get a working Mistral (prior the
undercloud deploy, or with all-on-one/standalone).&lt;/p&gt;
&lt;p&gt;Moreover, there is a need to make the CLI and UI converge. The latter already
uses the full list of validations. Adding the full support of
tripleo-validations to the CLI will improve the overall quality, usability and
maintenance of the validations.&lt;/p&gt;
&lt;p&gt;Finally, a third type should be added: service validations called during the
deploy itself. This doesn’t directly affect the tripleoclient codebase, but
tripleo-heat-templates.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;In order to improve the current situation, we propose to create a new
“branching” in the tripleoclient commands: &lt;cite&gt;openstack tripleo validator&lt;/cite&gt;&lt;/p&gt;
&lt;p&gt;This new subcommand will allow to list and run validations in an independent
way.&lt;/p&gt;
&lt;p&gt;Doing so will allow to get a clear and clean view on the validations we can run
depending on the stage we’re in.&lt;/p&gt;
&lt;p&gt;(Note: the subcommand has yet to be defined - this is only a “mock-up”.)&lt;/p&gt;
&lt;p&gt;The following subcommands should be supported:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;tripleo&lt;/span&gt; &lt;span class="pre"&gt;validator&lt;/span&gt; &lt;span class="pre"&gt;list&lt;/span&gt;&lt;/code&gt;: will display all the available
validations with a small description, like “validate network capabilities on
undercloud”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;tripleo&lt;/span&gt; &lt;span class="pre"&gt;validator&lt;/span&gt; &lt;span class="pre"&gt;run&lt;/span&gt;&lt;/code&gt;: will run the validations. Should take
options, like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--validation-name&lt;/span&gt;&lt;/code&gt;: run only the passed validation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--undercloud&lt;/span&gt;&lt;/code&gt;: runs all undercloud-related validations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--overcloud&lt;/span&gt;&lt;/code&gt;: runs all overcloud-related validations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--use-mistral&lt;/span&gt;&lt;/code&gt;: runs validations through Mistral&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--use-ansible&lt;/span&gt;&lt;/code&gt;: runs validations directly via Ansible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--plan&lt;/span&gt;&lt;/code&gt;: allows to run validations against specific plan. Defaults to
$TRIPLEO_PLAN_NAME or “overcloud”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;in addition, common options for all the subcommands:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--extra-roles&lt;/span&gt;&lt;/code&gt;: path to a local directory containing validation
roles maintained by the operator, or swift directory containing extra
validation roles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--output&lt;/span&gt;&lt;/code&gt;: points to a valid Ansible output_callback, such as the native
&lt;em&gt;json&lt;/em&gt;, or custom &lt;em&gt;validation_output&lt;/em&gt;. The default one should be the latter
as it renders a “human readable” output. More callbacks can be added later.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--extra-roles&lt;/span&gt;&lt;/code&gt; must support both local path and remote swift
container, since the custom validation support will push any validation to a
dedicated swift directory.&lt;/p&gt;
&lt;p&gt;The default engine will be determined by the presence of Mistral: if Mistral is
present and accepting requests (meaning the Undercloud is most probably
deployed), the validator has to use it by default. If no Mistral is present, it
must fallback on the ansible-playbook.&lt;/p&gt;
&lt;p&gt;The validations should be in the form of Ansible roles, in order to be
easily accessed from Mistral as well (as it is currently the case). It will
also allow to get a proper documentation, canvas and gives the possibility to
validate the role before running it (ensuring there are metadata, output,
and so on).&lt;/p&gt;
&lt;p&gt;We might also create some dedicated roles in order to make a kind of
“self validation”, ensuring we actually can run the validations (network,
resources, and so on).&lt;/p&gt;
&lt;p&gt;The UI uses Mistral workflows in order to run the validations - the CLI must
be able to use those same workflows of course, but also run at least some
validations directly via ansible, especially when we want to validate the
undercloud environment before we even deploy it.&lt;/p&gt;
&lt;p&gt;Also, in order to avoid Mistral modification, playbooks including validation
roles will be created.&lt;/p&gt;
&lt;p&gt;In the end, all the default validation roles should be in one and only one
location: tripleo-validations. The support for “custom validations” being added,
such custom validation should also be supported (see references for details).&lt;/p&gt;
&lt;p&gt;In order to get a proper way to “aim” the validations, proper validation groups
must be created and documented. Of course, one validation can be part of
multiple groups.&lt;/p&gt;
&lt;p&gt;In addition, a proper documentation with examples describing the Good Practices
regarding the roles content, format and outputs should be created.&lt;/p&gt;
&lt;p&gt;For instance, a role should contain a description, a “human readable error
output”, and if applicable a possible solution.&lt;/p&gt;
&lt;p&gt;Proper testing for the default validations (i.e. those in tripleo-validations)
might be added as well in order to ensure a new validation follows the Good
Practices.&lt;/p&gt;
&lt;p&gt;We might want to add support for “nagios-compatible outputs” and exit codes,
but it is not sure running those validations through any monitoring tool is a
good idea due to the possible load it might create. This has to be discussed
later, once we get the framework in place.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;No real alternatives in fact. Currently, we have many ways to validate, but
they are all unrelated, not concerted. If we don’t provide a unified framework,
we will get more and more “side validations ways” and it won’t be maintainable.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Rights might be needed for some validations - they should be added accordingly
in the system sudoers, in a way that limits unwanted privilege escalations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The end user will get a proper way to validate the environment prior to any
action.
This will give more confidence in the final product, and ease the update and
upgrade processes.&lt;/p&gt;
&lt;p&gt;It will also provide a good way to collect information about the systems in
case of failures.&lt;/p&gt;
&lt;p&gt;If a “nagios-compatible output” is to be created (mix of ansible JSON output,
parsing and compatibility stuff), it might provide a way to get a daily report
about the health of the stack - this might be a nice feature, but not in the
current scope (will need a new stdout_callback for instance).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The more validations we get, the more time it might take IF we decide to run
them by default prior any action.&lt;/p&gt;
&lt;p&gt;The current way to disable them, either with a configuration file or a CLI
option will stay.&lt;/p&gt;
&lt;p&gt;In addition, we can make a great use of “groups” in order to filter out greedy
validations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Providing a CLI subcommand for validation will make the deployment easier.&lt;/p&gt;
&lt;p&gt;Providing a unified framework will allow an operator to run the validations
either from the UI, or from the CLI, without any surprise regarding the
validation list.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;A refactoring will be needed in python-tripleoclient and probably in
tripleo-common in order to get a proper subcommand and options.&lt;/p&gt;
&lt;p&gt;A correct way to call Ansible from Python is to be decided (ansible-runner?).&lt;/p&gt;
&lt;p&gt;A correct way to call Mistral workflows from the CLI is to be created if it
does not already exist.&lt;/p&gt;
&lt;p&gt;In the end, the framework will allow other Openstack projects to push their own
validations, since they are the ones knowing how and what to validate in the
different services making Openstack.&lt;/p&gt;
&lt;p&gt;All validations will be centralized in the tripleo-validations repository.
This means we might want to create a proper tree in order to avoid having
100+ validations in the same directory.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;cjeanner&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;akrivoka
ccamacho
dpeacock
florianf&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;List current existing validations in both undercloud_preflight.py and
openstack-tripleo-validations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Decide if we integrate ansible-runner as a dependency (needs to be packaged).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement the undercloud_preflight validations as Ansible roles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement a proper way to call Ansible from the tripleoclient code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement support for a configuration file dedicated for the validations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement the new subcommand tree in tripleoclient.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Validate, Validate, Validate.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Ansible-runner: &lt;a class="reference external" href="https://github.com/ansible/ansible-runner"&gt;https://github.com/ansible/ansible-runner&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Openstack-tripleo-validations: &lt;a class="reference external" href="https://github.com/openstack/tripleo-validations"&gt;https://github.com/openstack/tripleo-validations&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The CI can’t possibly provide the “right” environment with all the requirements.
The code has to implement a way to configure the validations so that the CI
can override the &lt;em&gt;productive&lt;/em&gt; values we will set in the validations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;A new entry in the documentation must be created in order to describe this new
framework (for the devs) and new subcommand (for the operators).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2018-July/132263.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2018-July/132263.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1599829"&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1599829&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1601739"&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1601739&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/569513"&gt;https://review.openstack.org/569513&lt;/a&gt; (custom validation support)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/tripleo-docs/latest/install/validations/validations.html"&gt;https://docs.openstack.org/tripleo-docs/latest/install/validations/validations.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Tue, 12 Feb 2019 00:00:00 </pubDate></item><item><title>Integrate os_tempest role with TripleO</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/ostempest-tripleo.html</link><description>
 
&lt;p&gt;Launchpad Blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/os-tempest-tripleo"&gt;https://blueprints.launchpad.net/tripleo/+spec/os-tempest-tripleo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Tempest provides a set of API and integrations tests with batteries
included in order to validate the OpenStack Deployment. In TripleO
project, we are working towards using a unified tempest role i.e.
&lt;cite&gt;os_tempest&lt;/cite&gt; provided by OpenStack Ansible project in TripleO CI
in order to foster collaboration with multiple deployment tools and
improve our testing strategies within OpenStack Community.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;In the OpenStack Ecosystem, we have multiple &lt;em&gt;ansible based&lt;/em&gt; deployment tools
that use their own roles for install/configure and running tempest testing.
Each of these roles is trying to do similar stuff tied to the different
deployment tools. For example: &lt;cite&gt;validate-tempest&lt;/cite&gt; ansible role on TripleO CI
provides most of the stuff but it is tied with the TripleO deployment and
provides some nice feature (Like: bugcheck, failed tests email notification,
stackviz, python-tempestconf support for auto tempest.conf generation) which
are missing in other roles. It is leading to duplication and reduces what
tempest tests are not working across them, leading to no collaboration on
the Testing side.&lt;/p&gt;
&lt;p&gt;The OpenStack Ansible team provides &lt;cite&gt;os_tempest&lt;/cite&gt; role for installing/
configuring/running tempest and post tempest results processing and there
is a lot of duplication between their work and the roles used for testing
by the various deployment tools.It almost provides most of the stuff
provided by each of the deployment tool specific tempest roles. There are
few stuffs which are missing can be added in the role and make it useable
so that other deployment tools can consume it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;Using unified &lt;cite&gt;os_tempest&lt;/cite&gt; ansible role in TripleO CI will help to maintain
one less role within TripleO project and help us to collaborate with
openstack-ansible team in order to share/improve tests strategies across
OpenStack ecosystem and solve tempest issues fastly.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;In order to achieve that, we need:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Improve &lt;cite&gt;os_tempest&lt;/cite&gt; role to add support for package/container install,
python-tempestconf, stackviz, skip list, bugcheck, tempest
log collection at the proper place.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Have a working CI job on standalone running tempest from &lt;cite&gt;os_tempest&lt;/cite&gt;
role as well as on OSA side.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide an easy migration path from validate-tempest role.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;If we do not use the existing &lt;cite&gt;os_tempest&lt;/cite&gt; role then we need to re-write the
&lt;cite&gt;validate-tempest&lt;/cite&gt; role which will result in again duplication and it will
cost too much time and it also requires another set of efforts for adoption
in the community which does not seems to feasible.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;We need to educate users for migrating to &lt;cite&gt;os_tempest&lt;/cite&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Helps more collaboration and improves testing.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Arx Cruz (arxcruz)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chandan Kumar (chkumar246)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Martin Kopec (mkopec)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Install tempest and it’s dependencies from Distro packages&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Running tempest from containers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable stackviz&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;python-tempestconf support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;skiplist management&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keeping all tempest related files at one place&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bugcheck&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Standalone based TripleO CI job consuming os_tempest role&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migration path from validate-tempest to os_tempest role&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation update on How to use it&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RDO packaging&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Currently, os_tempest role depends on &lt;cite&gt;python_venv_build&lt;/cite&gt; role when
tempest is installed from source (git, pip, venv). We need to package it in RDO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The unified tempest role &lt;cite&gt;os_tempest&lt;/cite&gt; will replace validate-tempest
role with much more improvements.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation on how to consume &lt;cite&gt;os_tempest&lt;/cite&gt; needs to be updated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Unified Tempest role creation &amp;amp; calloboration email:
&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2018-August/133838.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2018-August/133838.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;os_tempest role:
&lt;a class="reference external" href="http://git.openstack.org/cgit/openstack/openstack-ansible-os_tempest"&gt;http://git.openstack.org/cgit/openstack/openstack-ansible-os_tempest&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Mon, 14 Jan 2019 00:00:00 </pubDate></item><item><title>Expedited Approvals</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/policy/expedited-approvals.html</link><description>
 
&lt;p&gt;In general, TripleO follows the standard “2 +2” review standard, but there are
situations where we want to make an exception.  This policy is intended to
document those exceptions.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Core reviewer time is precious, and there is never enough of it.  In some
cases, requiring 2 +2’s on a patch is a waste of that core time, so we need
to be reasonable about when to make exceptions.  While core reviewers are
always free to use their judgment about when to merge or not merge a patch,
it can be helpful to list some specific situations where it is acceptable and
even expected to approve a patch with a single +2.&lt;/p&gt;
&lt;p&gt;Part of this information is already in the wiki, but the future of the wiki
is in doubt and it’s better to put policies in a place that they can be
reviewed anyway.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="policy"&gt;
&lt;h2&gt;Policy&lt;/h2&gt;
&lt;section id="single-2-approvals"&gt;
&lt;h3&gt;Single +2 Approvals&lt;/h3&gt;
&lt;p&gt;A core can and should approve patches without a second +2 under the following
circumstances:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The change has multiple +2’s on previous patch sets, indicating an agreement
from the other cores that the overall design is good, and any alterations to
the patch since those +2’s must be minor implementation details only -
trivial rebases, minor syntax changes, or comment/documentation changes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Backports proposed by another core reviewer.  Backports should already have
been reviewed for design when they merged to master, so if two cores agree
that the backport is good (one by proposing, the other by reviewing), they
can be merged with a single +2 review.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Requirements updates proposed by the bot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Translation updates proposed by the bot. (See also &lt;a class="reference external" href="https://docs.openstack.org/i18n/latest/reviewing-translation-import.html"&gt;reviewing
translation imports&lt;/a&gt;.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="co-author-2"&gt;
&lt;h3&gt;Co-author +2&lt;/h3&gt;
&lt;p&gt;Co-authors on a patch are allowed to +2 that patch, but at least one +2 from a
core not listed as a co-author is required to merge the patch.  For example, if
core A pushes a patch with cores B and C as a co-authors, core B and core C are
both allowed to +2 that patch, but another core is required to +2 before the
patch can be merged.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="self-approval"&gt;
&lt;h3&gt;Self-Approval&lt;/h3&gt;
&lt;p&gt;It is acceptable for a core to self-approve a patch they submitted if it has the
requisite 2 +2’s and a CI pass.  However, this should not be done if there is any
dispute about the patch, such as on a change with 2 +2’s and an unresolved -1.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="note-on-ci"&gt;
&lt;h3&gt;Note on CI&lt;/h3&gt;
&lt;p&gt;This policy does not affect CI requirements.  Patches must still pass CI before
merging.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives-history"&gt;
&lt;h2&gt;Alternatives &amp;amp; History&lt;/h2&gt;
&lt;p&gt;This policy has been in effect for a while now, but not every TripleO core is
aware of it, so it is simply being written down in an official location for
reference.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="author-s"&gt;
&lt;h3&gt;Author(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary author:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bnemec&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="milestones"&gt;
&lt;h3&gt;Milestones&lt;/h3&gt;
&lt;p&gt;The policy is already in effect.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Ensure all cores are aware of the policy.  Once the policy has merged, an email
should be sent to openstack-dev referring to it.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Existing wiki on review guidelines:
&lt;a class="reference external" href="https://wiki.openstack.org/wiki/TripleO/ReviewGuidelines"&gt;https://wiki.openstack.org/wiki/TripleO/ReviewGuidelines&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Previous spec that implemented some of this policy:
&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/kilo/tripleo-review-standards.html"&gt;https://specs.openstack.org/openstack/tripleo-specs/specs/kilo/tripleo-review-standards.html&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="revision-history"&gt;
&lt;h2&gt;Revision History&lt;/h2&gt;
&lt;table class="docutils align-default" id="id1"&gt;
&lt;caption&gt;&lt;span class="caption-text"&gt;Revisions&lt;/span&gt;&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Release Name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;Newton&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Introduced&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-odd"&gt;&lt;td&gt;&lt;p&gt;Newton&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Added co-author +2 policy&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;Ocata&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Added note on translation imports&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0
Unported License.
&lt;a class="reference external" href="https://creativecommons.org/licenses/by/3.0/legalcode"&gt;https://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Fri, 14 Dec 2018 00:00:00 </pubDate></item><item><title>Major Upgrades Including Operating System Upgrade</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/upgrades-with-operating-system.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/upgrades-with-os"&gt;https://blueprints.launchpad.net/tripleo/+spec/upgrades-with-os&lt;/a&gt;&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Abbreviation “OS” in this spec stands for “operating system”, not
“OpenStack”.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;So far all our update and upgrade workflows included doing minor
operating system updates (essentially a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;yum&lt;/span&gt; &lt;span class="pre"&gt;update&lt;/span&gt;&lt;/code&gt;) on the
machines managed by TripleO. This will need to change as we can’t stay
on a single OS release indefinitely – we’ll need to perform a major
OS upgrade. The intention is for the TripleO tooling to help with the
OS upgrade significantly, rather than leaving this task entirely to
the operator.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;We need to upgrade undercloud and overcloud machines to a new release
of the operating system.&lt;/p&gt;
&lt;p&gt;We would like to provide an upgrade procedure both for environments
where Nova and Ironic are managing the overcloud servers, and
“Deployed Server” environments where we don’t have control over
provisioning.&lt;/p&gt;
&lt;p&gt;Further constraints are imposed by Pacemaker clusters: Pacemaker is
non-containerized, so it is upgraded via packages together with the
OS. While Pacemaker would be capable of a rolling upgrade, Corosync
also changes major version, and starts to rely on knet for the link
protocol layer, which is incompatible with previous version of
Corosync. This introduces additional complexity: we can’t do OS
upgrade in a rolling fashion naively on machines which belong to the
Pacemaker cluster (controllers).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change-high-level-view"&gt;
&lt;h2&gt;Proposed Change - High Level View&lt;/h2&gt;
&lt;p&gt;The Pacemaker constraints will be addressed by performing a one-by-one
(though not rolling) controller upgrade – temporarily switching to a
single-controller cluster on the new OS, and gradually upgrading the
rest. This will also require implementation of persistent OpenStack
data transfer from older to newer OS releases (to preserve uptime and
for easier recoverability in case of failure).&lt;/p&gt;
&lt;p&gt;We will also need to ensure that at least 2 ceph-mon services run at
all times, so ceph-mon services will keep running even after we switch
off Pacemaker and OpenStack on the 2 older controllers.&lt;/p&gt;
&lt;p&gt;We should scope two upgrade approaches: full reprovisioning, and
in-place upgrade via an upgrade tool. Each come with different
benefits and drawbacks. The proposed CLI workflows should ideally be
generic enough to allow picking the final preferred approach of
overcloud upgrade late in the release cycle.&lt;/p&gt;
&lt;p&gt;While the overcloud approach is still wide open, undercloud seems to
favor an in-place upgrade due to not having a natural place to persist
the data during reprovisioning (e.g. we can’t assume overcloud
contains Swift services), but that could be overcome by making the
procedure somewhat more manual and shifting some tasks onto the
operator.&lt;/p&gt;
&lt;p&gt;The most viable way of achieving an in-place (no reprovisioning)
operating system upgrade currently seems to be &lt;a class="reference external" href="https://leapp-to.github.io/"&gt;Leapp&lt;/a&gt;, “an app
modernization framework”, which should include in-place upgrade
capabilites.&lt;/p&gt;
&lt;p&gt;Points in favor of in-place upgrade:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;While some data will need to be persisted and restored regardless of
approach taken (to allow safe one-by-one upgrade), reprovisioning
may also require managing data which would otherwise persist on its
own during an in-place upgrade.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In-place upgrade allows using the same approach for Nova+Ironic and
Deployed Server environments. If we go with reprovisioning, on
Deployed Server environments the operator will have to reprovision
using their own tooling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Environments with a single controller will need different DB
mangling procedure. Instead of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;system_upgrade_transfer_data&lt;/span&gt;&lt;/code&gt; step
below, their DB data will be included into the persist/restore
operations when reprovisioning the controller.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Points in favor of reprovisioning:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Not having to integrate with external in-place upgrade tool. E.g. in
case of CentOS, there’s currently not much info available about
in-place upgrade capabilities.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Allows to make changes which wouldn’t otherwise be possible,
e.g. changing a filesystem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reprovisioning brings nodes to a clean state. Machines which are
continuously upgraded without reprovisioining can potentially
accumulate unwanted artifacts, resulting in increased number of
problems/bugs which only appear after an upgrade, but not on fresh
deployments.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change-operator-workflow-view"&gt;
&lt;h2&gt;Proposed Change - Operator Workflow View&lt;/h2&gt;
&lt;p&gt;The following is an example of expected upgrade workflow in a
deployment with roles: &lt;strong&gt;ControllerOpenstack, Database, Messaging,
Networker, Compute, CephStorage&lt;/strong&gt;. It’s formulated in a
documentation-like manner so that we can best imagine how this is
going to work from operator’s point of view.&lt;/p&gt;
&lt;section id="upgrading-the-undercloud"&gt;
&lt;h3&gt;Upgrading the Undercloud&lt;/h3&gt;
&lt;p&gt;The in-place undercloud upgrade using Leapp will likely consist of the
following steps. First, prepare for OS upgrade via Leapp, downloading
the necessary packages:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;leapp&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then reboot, which will upgrade the OS:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;reboot&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then run the undercloud upgrade, which will bring back the undercloud
services (using the newer OpenStack release):&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;tripleo&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="n"&gt;prepare&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="n"&gt;containers&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;prepare&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;parameter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;undercloud&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If we wanted or needed to upgrade the undercloud via reprovisioning,
we would use a &lt;a class="reference external" href="http://tripleo.org/install/controlplane_backup_restore/00_index.html"&gt;backup and restore&lt;/a&gt; procedure as currently
documented, with restore perhaps being utilized just partially.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrading-the-overcloud"&gt;
&lt;h3&gt;Upgrading the Overcloud&lt;/h3&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Update the Heat stack&lt;/strong&gt;, generate Heat outputs for building
upgrade playbooks:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;prepare&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;DEPLOY&lt;/span&gt; &lt;span class="n"&gt;ARGS&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Among the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;&amp;lt;DEPLOY&lt;/span&gt; &lt;span class="pre"&gt;ARGS&amp;gt;&lt;/span&gt;&lt;/code&gt; should be
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;containers-prepare-parameter.yaml&lt;/span&gt;&lt;/code&gt; bringing in the containers
of newer OpenStack release.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prepare an OS upgrade on one machine from each of the
“schema-/cluster-sensitive” roles&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_prepare&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;This stops all services on the nodes selected.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For external installers like Ceph, we’ll have a similar
external-upgrade command, which can e.g. remove the nodes from
the Ceph cluster:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;external&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_prepare&lt;/span&gt; \
    &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_nodes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If we use in-place upgrade:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This will run the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;leapp&lt;/span&gt; &lt;span class="pre"&gt;upgrade&lt;/span&gt;&lt;/code&gt; command. It should use
newer OS and newer OpenStack repos to download packages, and
leave the node ready to reboot into the upgrade process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caution: Any reboot after this is done on a particular node
will cause that node to automatically upgrade to newer OS.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If we reprovision:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This should persist node’s important data to the
undercloud. (Only node-specific data. It would not include
e.g. MariaDB database content, which would later be transferred
from one of the other controllers instead.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Services can export their &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;upgrade_tasks&lt;/span&gt;&lt;/code&gt; to do the
persistence, we should provide an Ansible module or role to
make it DRY.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Upload new overcloud base image&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="n"&gt;upload&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;existing&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;For Nova+Ironic environments only. After this step any new or
reprovisioned nodes will receive the new OS.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run an OS upgrade on one node from each of the
“schema-/cluster-sensitive” roles&lt;/strong&gt; or &lt;strong&gt;reprovision those nodes&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Only if we do reprovisioning:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;rebuild&lt;/span&gt; &lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;rebuild&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;rebuild&lt;/span&gt; &lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;admin&lt;/span&gt; &lt;span class="n"&gt;authorize&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ssh&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ssh&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ssh&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;network&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ssh&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Both reprovisioning and in-place:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This step either performs a reboot of the nodes and lets Leapp
upgrade them to newer OS, or reimages the nodes with a fresh new
OS image. After they come up, they’ll have newer OS but no
services running. The nodes can be checked before continuing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In case of reprovisioning:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;admin&lt;/span&gt; &lt;span class="pre"&gt;authorize&lt;/span&gt;&lt;/code&gt; will ensure existence of
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-admin&lt;/span&gt;&lt;/code&gt; user and authorize Mistral’s ssh keys for
connection to the newly provisioned nodes. The
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--overcloud-ssh-*&lt;/span&gt;&lt;/code&gt; work the same as for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud&lt;/span&gt;
&lt;span class="pre"&gt;deploy&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--tags&lt;/span&gt; &lt;span class="pre"&gt;system_upgrade_run&lt;/span&gt;&lt;/code&gt; is still necessary because it
will restore the node-specific data from the undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Services can export their &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;upgrade_tasks&lt;/span&gt;&lt;/code&gt; to do the
restoration, we should provide an Ansible module or role to
make it DRY.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ceph-mon count is reduced by 1 (from 3 to 2 in most
environments).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caution: This will have bad consequences if run by accident on
unintended nodes, e.g. on all nodes in a single role. If
possible, it should refuse to run if –limit is not specified. If
possible further, it should refuse to run if a full role is
included, rather than individual nodes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stop services on older OS and transfer data to newer OS&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;external&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_transfer_data&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;ControllerOpenstack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Messaging&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;This is where control plane downtime starts.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Here we should:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Detect which nodes are on older OS and which are on newer OS.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fail if we don’t find &lt;em&gt;at least one&lt;/em&gt; older OS and &lt;em&gt;exactly
one&lt;/em&gt; newer OS node in each role.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On older OS nodes, stop all services except ceph-mon. (On newer
node, no services are running yet.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Transfer data from &lt;em&gt;an&lt;/em&gt; older OS node (simply the first one in
the list we detect, or do we need to be more specific?) to
&lt;em&gt;the&lt;/em&gt; newer OS node in a role. This is probably only going to
do anything on the Database role which includes DBs, and will
be a no-op for others.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Services can export their &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;external_upgrade_tasks&lt;/span&gt;&lt;/code&gt; for the
persist/restore operations, we’ll provide an Ansible module or
role to make it DRY. The transfer will likely go via undercloud
initially, but it would be nice to make it direct in order to
speed it up.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run the usual upgrade tasks on the newer OS nodes&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Control plane downtime stops at the end of this step.&lt;/strong&gt; This
means the control plane downtime spans two commands. We should
&lt;em&gt;not&lt;/em&gt; make it one command because the commands use different
parts of upgrade framework underneath, and the separation will
mean easier re-running of individual parts, should they fail.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Here we start pcmk cluster and all services on the newer OS
nodes, using the data previously transferred from the older OS
nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Likely we won’t need any special per-service upgrade tasks,
unless we discover we need some data conversions or
adjustments. The node will be with all services stopped after
upgrade to newer OS, so likely we’ll be effectively “setting up a
fresh cloud on pre-existing data”.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caution: At this point the newer OS nodes became the authority on
data state. Do not re-run the previous data transfer step after
services have started on newer OS nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Currently &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;upgrade&lt;/span&gt; &lt;span class="pre"&gt;run&lt;/span&gt;&lt;/code&gt; has &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--nodes&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--roles&lt;/span&gt;&lt;/code&gt; which
both function the same, as Ansible &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--limit&lt;/span&gt;&lt;/code&gt;. Notably, nothing
stops you from passing role names to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--nodes&lt;/span&gt;&lt;/code&gt; and vice
versa. Maybe it’s time to retire those two and implement
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--limit&lt;/span&gt;&lt;/code&gt; to match the concept from Ansible closely.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Perform any service-specific &amp;amp;&amp;amp; node-specific external upgrades,
most importantly Ceph&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;external&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_run&lt;/span&gt; \
    &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_nodes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Ceph-ansible here runs on a single node and spawns a new version
of ceph-mon. Per-node run capability will need to be added to
ceph-ansible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ceph-mon count is restored here (in most environments, it means
going from 2 to 3).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Upgrade the remaining control plane nodes&lt;/strong&gt;. Perform all the
previous control plane upgrade steps for the remaining controllers
too. Two important notes here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Do not run the ``system_upgrade_transfer_data`` step anymore.&lt;/strong&gt;
The remaining controllers are expected to join the cluster and
sync the database data from the primary controller via DB
replication mechanism, no explicit data transfer should be
necessary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To have the necessary number of ceph-mons running at any given
time (often that means 2 out of 3), the controllers (ceph-mon
nodes) should be upgraded one-by-one.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After this step is finished, all of the nodes which are sensitive
to Pacemaker version or DB schema version should be upgraded to
newer OS, newer OpenStack, and newer ceph-mons.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Upgrade the rest of the overcloud nodes&lt;/strong&gt; (Compute, Networker,
CephStorage), &lt;strong&gt;either one-by-one or in batches&lt;/strong&gt;, depending on
uptime requirements of particular nodes. E.g. for computes this
would mean evacuating and then also running:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_prepare&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;novacompute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;novacompute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="n"&gt;novacompute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Ceph OSDs can be removed by the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;external-upgrade&lt;/span&gt; &lt;span class="pre"&gt;run&lt;/span&gt; &lt;span class="pre"&gt;--tags&lt;/span&gt;
&lt;span class="pre"&gt;system_upgrade_prepare&lt;/span&gt;&lt;/code&gt; step before reprovisioning, and after
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;upgrade&lt;/span&gt; &lt;span class="pre"&gt;run&lt;/span&gt;&lt;/code&gt; command, ceph-ansible can recreate the OSD via
the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;external-upgrade&lt;/span&gt; &lt;span class="pre"&gt;run&lt;/span&gt; &lt;span class="pre"&gt;--tags&lt;/span&gt; &lt;span class="pre"&gt;system_upgrade_run&lt;/span&gt;&lt;/code&gt; step,
always limited to the OSD being upgraded:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="c1"&gt;# Remove OSD&lt;/span&gt;
&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;external&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_prepare&lt;/span&gt; \
    &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_nodes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;novacompute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="c1"&gt;# &amp;lt;&amp;lt;Here the node is reprovisioned and upgraded&amp;gt;&amp;gt;&lt;/span&gt;

&lt;span class="c1"&gt;# Re-deploy OSD&lt;/span&gt;
&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;external&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_run&lt;/span&gt; \
    &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_nodes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;novacompute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Perform online upgrade&lt;/strong&gt; (online data migrations) after all nodes
have been upgraded:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;external&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;online_upgrade&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Perfrom upgrade converge&lt;/strong&gt; to re-assert the overcloud state:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;converge&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;DEPLOY&lt;/span&gt; &lt;span class="n"&gt;ARGS&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Clean up upgrade data persisted on undercloud&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;external&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;upgrade&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; \
    &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;system_upgrade_cleanup&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="additional-notes-on-data-persist-restore"&gt;
&lt;h3&gt;Additional notes on data persist/restore&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;There are two different use cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Persistence for things that need to survive reprovisioning (for
each node)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Transfer of DB data from node to node (just once to bootstrap the
first new OS node in a role)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a class="reference external" href="https://docs.ansible.com/ansible/latest/modules/synchronize_module.html"&gt;synchronize Ansible module&lt;/a&gt; shipped with Ansible seems
fitting, we could wrap it in a role to handle common logic, and
execute the role via &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;include_role&lt;/span&gt;&lt;/code&gt; from
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;upgrade_tasks&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We would persist the temporary data on the undercloud under a
directory accessible only by the user which runs the upgrade
playbooks (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mistral&lt;/span&gt;&lt;/code&gt; user). The root dir could be
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/var/lib/tripleo-upgrade&lt;/span&gt;&lt;/code&gt; and underneath would be subdirs for
individual nodes, and one more subdir level for services.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;(Undercloud’s Swift also comes to mind as a potential place for
storage. However, it would probably add more complexity than
benefit.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The data persist/restore operations within the upgrade do not
supplement or replace backup/restore procedures which should be
performed by the operator, especially before upgrading.&lt;/strong&gt; The
automated data persistence is solely for upgrade purposes, not for
disaster recovery.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parallel cloud migration.&lt;/strong&gt; We could declare the in-place upgrade
of operating system + OpenStack as too risky and complex and time
consuming, and recommend standing up a new cloud and transferring
content to it. However, this brings its own set of challenges.&lt;/p&gt;
&lt;p&gt;This option is already available for anyone whose environment is
constrained such that normal upgrade procedure is not realistic,
e.g. in case of extreme uptime requirements or extreme risk-aversion
environments.&lt;/p&gt;
&lt;p&gt;Implementing parallel cloud migration is probably best handled on a
per-environment basis, and TripleO doesn’t provide any automation in
this area.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Upgrading the operating system separately from OpenStack.&lt;/strong&gt; This
would simplify things on several fronts, but separating the
operating system upgrade while preserving uptime (i.e. upgrading the
OS in a rolling fashion node-by-node) currently seems not realistic
due to:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The pacemaker cluster (corosync) limitations mentioned earlier. We
would have to containerize Pacemaker (even if just ad-hoc
non-productized image).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Either we’d have to make OpenStack (and dependencies) compatible
with OS releases in a way we currently do not intend, or at least
ensure such compatibility when running containerized. E.g. for
data transfer, we could then probably use Galera native
replication.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OS release differences might be too large. E.g. in case of
differing container runtimes, we might have to make t-h-t be able
to deploy on two runtimes within one deployment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Upgrading all control plane nodes at the same time as we’ve been
doing so far.&lt;/strong&gt; This is not entirely impossible, but rebooting all
controllers at the same time to do the upgrade could mean total
ceph-mon unavailability. Also given that the upgraded nodes are
unreachable via ssh for some time, should something go wrong and the
nodes got stuck in that state, it could be difficult to recover back
into a working cloud.&lt;/p&gt;
&lt;p&gt;This is probably not realistic, mainly due to concerns around Ceph
mon availability and risk of bricking the cloud.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;How we transfer data from older OS machines to newer OS machines is
a potential security concern.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The same security concern applies for per-node data persist/restore
procedure in case we go with reprovisioning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The stored data may include overcloud node’s secrets and should be
cleaned up from the undercloud when no longer needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In case of using the &lt;a class="reference external" href="https://docs.ansible.com/ansible/latest/modules/synchronize_module.html"&gt;synchronize Ansible module&lt;/a&gt;: it uses rsync
over ssh, and we would store any data on undercloud in a directory
only accessible by the same user which runs the upgrade playbooks
(&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mistral&lt;/span&gt;&lt;/code&gt;). This undercloud user has full control over overcloud
already, via ssh keys authorized for all management operations, so
this should not constitute a significant expansion of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;mistral&lt;/span&gt;&lt;/code&gt;
user’s knowledge/capabilities.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The upgrade procedure is riskier and more complex.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;More things can potentially go wrong.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It will take more time to complete, both manually and
automatically.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Given that we upgrade one of the controllers while the other two are
still running, the control plane services downtime could be slightly
shorter than before.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When control plane services are stopped on older OS machines and
running on newer OS machine, we create a window without high
availability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upgrade framework might need some tweaks but on high level it seems
we’ll be able to fit the workflow into it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All the upgrade steps should be idempotent, rerunnable and
recoverable as much as we can make them so.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Floating IP availability could be affected. Neutron upgrade
procedure typically doesn’t immediately restart sidecar containers
of L3 agent. Restarting will be a must if we upgrade the OS.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;When control plane services are stopped on older OS machines and
running on newer OS machine, only one controller is available to
serve all control plane requests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Depending on role/service composition of the overcloud, the reduced
throughput could also affect tenant traffic, not just control plane
APIs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Automating such procedure introduces some code which had better not
be executed by accident. The external upgrade tasks which are tagged
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;system_upgrade_*&lt;/span&gt;&lt;/code&gt; should also be tagged &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;never&lt;/span&gt;&lt;/code&gt;, so that they
only run when explicitly requested.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the data transfer step specifically, we may also introduce a
safety “flag file” on the target overcloud node, which would prevent
re-running of the data transfer until the file is manually removed.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers who work on specific composable services in TripleO will
need to get familiar with the new upgrade workflow.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="main-risks"&gt;
&lt;h3&gt;Main Risks&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Leapp has been somewhat explored but its viability/readiness for our
purpose is still not 100% certain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CI testing will be difficult, if we go with Leapp it might be
impossible (more below).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Time required to implement everything may not fit within the release
cycle.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We have some idea how to do the data persist/restore/transfer parts,
but some prototyping needs to be done there to gain confidence.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We don’t know exactly what data needs to be persisted during
reprovisioning.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl&gt;
&lt;dt&gt;Primary assignees::&lt;/dt&gt;&lt;dd&gt;&lt;div class="line-block"&gt;
&lt;div class="line"&gt;jistr, chem, jfrancoa&lt;/div&gt;
&lt;/div&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors::&lt;/dt&gt;&lt;dd&gt;&lt;div class="line-block"&gt;
&lt;div class="line"&gt;fultonj for Ceph&lt;/div&gt;
&lt;/div&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;With aditional info in format: (how much do we know about this task,
estimate of implementation difficulty).&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;(semi-known, est. as medium) Change tripleo-heat-templates +
puppet-tripleo to be able to set up a cluster on just one controller
(with newer OS) while the Heat stack knows about all
controllers. This is currently not possible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(semi-known, est. as medium) Amend upgrade_tasks to work for
Rocky-&amp;gt;Stein with OS upgrade.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;system_upgrade_transfer_data&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;(unknown, est. as easy) Detect upgraded vs. unupgraded machines to
transfer data to/from.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(known, est. as easy) Stop all services on the unupgraded machines
transfer data to/from. (Needs to be done via external upgrade
tasks which is new, but likely not much different from what we’ve
been doing.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(semi-known, est. as medium/hard) Implement an Ansible role for
transferring data from one node to another via undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(unknown, est. as medium) Figure out which data needs transferring
from old controller to new, implement it using the above Ansible
role – we expect only MariaDB to require this, any special
services should probably be tackled by service squads.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(semi-known, est. as medium/hard) Implement Ceph specifics, mainly
how to upgrade one node (mon, OSD, …) at a time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(unknown, either easy or hacky or impossible :) ) Implement
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--limit&lt;/span&gt;&lt;/code&gt; for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;external-upgrade&lt;/span&gt; &lt;span class="pre"&gt;run&lt;/span&gt;&lt;/code&gt;. (As external upgrade runs
on undercloud by default, we’ll need to use &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;delegate_to&lt;/span&gt;&lt;/code&gt; or
nested Ansible for overcloud nodes. I’m not sure how well –limit
will play with this.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(known, est. as easy) Change update/upgrade CLI from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--nodes&lt;/span&gt;&lt;/code&gt;
and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--roles&lt;/span&gt;&lt;/code&gt; to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--limit&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(semi-known, est. as easy/medium) Add &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;-e&lt;/span&gt;&lt;/code&gt; variable pass-through
support to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;external-upgrade&lt;/span&gt; &lt;span class="pre"&gt;run&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(unknown, unknown) Test as much as we can in CI – integrate with
tripleo-upgrade and OOOQ.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For reprovisioning:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;(semi-known, est. as medium) Implement &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;admin&lt;/span&gt;
&lt;span class="pre"&gt;authorize&lt;/span&gt;&lt;/code&gt;. Should take &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--stack&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--limit&lt;/span&gt;&lt;/code&gt;,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--overcloud-ssh-*&lt;/span&gt;&lt;/code&gt; params.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(semi-known, est. as medium/hard) Implement an Ansible role for
temporarily persisting overcloud nodes’ data on the undercloud and
restoring it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(known, est. as easy) Implement &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;external-upgrade&lt;/span&gt; &lt;span class="pre"&gt;run&lt;/span&gt; &lt;span class="pre"&gt;--tags&lt;/span&gt;
&lt;span class="pre"&gt;system_upgrade_cleanup&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(unknown, est. as hard in total, but should probably be tackled by
service squads) Figure out which data needs persisting for
particular services and implement the persistence using the above
Ansible role.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For in-place:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;(semi-known, est. as easy) Calls to Leapp in
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;system_upgrade_prepare&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;system_upgrade_run&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(semi-known, est. as medium) Implement a Leapp actor to set up or
use the repositories we need.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;For in-place: Leapp tool being ready to upgrade the OS.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Changes to ceph-ansible might be necessary to make it possible to
run it on a single node (for upgrading mons and OSDs node-by-node).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Testing is one of the main estimated pain areas. This is a traditional
problem with upgrades, but it’s even more pronounced for OS upgrades.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Since we do all the OpenStack infra cloud testing of TripleO on
CentOS 7 currently, it would make sense to test an upgrade to
CentOS 8. However, CentOS 8 is nonexistent at the time of writing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It is unclear when Leapp will be ready for testing an upgrade from
CentOS 7, and it’s probably the only thing we’d be able to execute
in CI. The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;server&lt;/span&gt; &lt;span class="pre"&gt;rebuild&lt;/span&gt;&lt;/code&gt; alternative is probably not
easily executable in CI, at least not in OpenStack infra clouds. We
might be able to emulate reprovisioning by wiping data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even if we find a way to execute the upgrade in CI, it might still
take too long to make the testing plausible for validating patches.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Upgrade docs will need to be amended, the above spec is written mainly
from the perspective of expected operator workflow, so it should be a
good starting point.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://leapp-to.github.io/"&gt;Leapp&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://leapp-to.github.io/actors"&gt;Leapp actors&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://leapp-to.github.io/architecture"&gt;Leapp architecture&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-ptg-stein"&gt;Stein PTG etherpad&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://tripleo.org/install/controlplane_backup_restore/00_index.html"&gt;backup and restore&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.ansible.com/ansible/latest/modules/synchronize_module.html"&gt;synchronize Ansible module&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Tue, 04 Dec 2018 00:00:00 </pubDate></item><item><title>TripleO - Pattern to safely spawn a container from a container</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/safe-side-containers.html</link><description>
 
&lt;p&gt;This spec describes a pattern which can be used as an alternative to
what TripleO does today to allow certain containers (Neutron, etc.) to
spawn side processes which require special privs like network
namespaces. Specifically it avoids exposing the docker socket or
using Podman nsenter hacks that have recently entered the codebase in Stein.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;In Queens TripleO implemented a containerized architecture with the goal of
containerizing all OpenStack services. This architecture was a success but
a few applications had regressions when compared with their baremetal deployed
equivalent. One of these applications was Neutron, which requires the ability
to spawn long lived “side” processes that are launched directly from the
Neutron agents themselves. In the original Queens architecture Neutron
launched these side processes inside of the agent container itself which
caused a service disruption if the neutron agents themselves were restarted.
This was previously not the case on baremetal as these processes would continue
running across an agent restart/upgrade.&lt;/p&gt;
&lt;p&gt;The work around in Rocky was to add “wrapper” scripts for Neutron agents and
to expose the docker socket to each agent container. These wrappers scripts
were bind mounted into the containers so that they overwrote the normal location
of the side process. Using this crude mechanism binaries like ‘dnsmasq’ and
‘haproxy’ would instead launch a shell script instead of the normal binary and
these custom shell scripts relied on the an exposed docker socket from the
host to be able to launch a side container with the same arguments supplied
to the script.&lt;/p&gt;
&lt;p&gt;This mechanism functionally solved the issues with our containerization but
exposed some security problems in that we were now exposing the ability to
launch any container to these Neutron agent containers (privileged containers
with access to a docker socket).&lt;/p&gt;
&lt;p&gt;In Stein things changed with our desire to support Podman. Unlike Docker
Podman does not include a daemon on the host. All Podman commands are executed
via a CLI which runs the command on the host directly. We landed
patches which required Podman commands to use nsenter to enter the hosts
namespace and run the commands there directly. Again this mechanism requires
extra privileges to be granted to the Neutron agent containers in order for
them to be able to launch these commands. Furthermore the mechanism is
a bit cryptic to support and debug in the field.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Use systemd on the host to launch the side process containers directly with
support for network namespaces that Neutron agents require. The benefit of
this approach is that we no longer have to give the Neutron containers privs
to launch containers which they shouldn’t require.&lt;/p&gt;
&lt;p&gt;The pattern could work like this:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;A systemd.path file monitors a know location on the host for changes.
Example (neutron-dhcp-dnsmasq.path):&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Path&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="l l-Scalar l-Scalar-Plain"&gt;PathModified=/var/lib/neutron/neutron-dnsmasq-processes-timestamp&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="l l-Scalar l-Scalar-Plain"&gt;PathChanged=/var/lib/neutron/neutron-dnsmasq-processes-timestamp&lt;/span&gt;&lt;span class="w"/&gt;

&lt;span class="l l-Scalar l-Scalar-Plain"&gt;[Install]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="l l-Scalar l-Scalar-Plain"&gt;WantedBy=multi-user.target&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;When systemd.path notices a change it fires the service for this
path file:
Example (neutron-dhcp-dnsmasq.service):&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Unit&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Description=neutron dhcp dnsmasq sync service&lt;/span&gt;&lt;span class="w"/&gt;

&lt;span class="l l-Scalar l-Scalar-Plain"&gt;[Service]&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Type=oneshot&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ExecStart=/usr/local/bin/neutron-dhcp-dnsmasq-process-sync&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="l l-Scalar l-Scalar-Plain"&gt;User=root&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;We use the same “wrapper scripts” used today to write two files. The
first file is a dump of CLI arguments used to launch the process
on the host. This file can optionally include extra data like
network namespaces which are required for some neutron side processes.
The second file is a timestamp which is monitored by systemd.path
on the host for changes and is used as a signal that it needs to
process the first file with arguments.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;# When a change is detected the systemd.service above executes a script on the&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;host to cleanly launch containerized side processes. When the script finishes
launching processes it truncates the file to start with a clean slate.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;# Both the wrapper scripts and the host scripts use flock to eliminate race&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;conditions which could cause issues in relaunching or missed containers.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;With Podman an API like varlink would be an option however it would likely
still required exposure to a socket on the host which would involve
extra privileges like what we have today. This would avoid the nsenter hacks
however.&lt;/p&gt;
&lt;p&gt;An architecture like Kubernetes would give us an API which could be used
to launch containers directly via the COE.&lt;/p&gt;
&lt;p&gt;Additionally an external process manager in Neutron that is “containers aware”
could be written to improve either of the above options.  The current python
in Neutron was writtin primarily for launching processes on baremetal with
assumptions that some of the processes it launches are meant to live across
a contain restart. Implementing a class that can launch side processes via a
clean interface rather than overwriting binaries would be desirable.
Classes which supported launching containers via Kubernetes and or Systemd
via the host directly could be supported.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;This mechanism should allow us to remove some of the container privileges for
neutron agents which in the past were used to execute containers. It is
a more restrictive crude interface that allows the containers only to launch
a specific type of process rather than any container it chooses.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;The side process containers should be the same regardless of how they are
launched so the upgrade should be minimal.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;dan-prince&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;emilienm&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;# Ansible playbook to create systemd files, wrappers&lt;/p&gt;
&lt;p&gt;# TripleO Heat template updates to use the new playbooks&lt;/p&gt;
&lt;p&gt;# Remove/deprecate the old docker.socket and nsenter code from puppet-tripleo&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</description><pubDate>Mon, 26 Nov 2018 00:00:00 </pubDate></item><item><title>Provision nodes without Nova and Glance</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/nova-less-deploy"&gt;https://blueprints.launchpad.net/tripleo/+spec/nova-less-deploy&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Currently TripleO undercloud uses Heat, Nova, Glance, Neutron and Ironic for
provisioning bare metal machines. This blueprint proposes excluding Heat, Nova
and Glance from this flow, removing Nova and Glance completely from the
undercloud.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Making TripleO workflows use Ironic directly to provision nodes has quite a few
benefits:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;First and foremost, getting rid of the horrible “no valid hosts found”
exception. The scheduling will be much simpler and the errors will be
clearer.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This and many other problems with using Nova in the undercloud come from
the fact that Nova is cloud-oriented software, while the undercloud is
more of a traditional installer. In the “pet vs cattle” metaphore, Nova
handles the “cattle” case, while the undercloud is the “pet” case.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Also important for the generic provisioner case, we’ll be able to get rid of
Nova and Glance, reducing the memory footprint.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We’ll get rid of pre-deploy validations that currently try to guess what
Nova scheduler will expect.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We’ll be able to combine nodes deployed by Ironic with pre-deployed servers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We’ll become in charge of building the configdrive, potentially putting more
useful things there.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hopefully, scale-up will be less error-prone.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Also in the future we may be able to:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Integrate things like building RAID on demand much easier.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use introspection data in scheduling and provisioning decisions.
Particularly, we can automate handling root device hints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make Neutron optional and use static DHCP and/or &lt;em&gt;os-net-config&lt;/em&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;This blueprint proposes removal replacing the triad Heat-Nova-Glance with
Ironic driven directly by Mistral. To avoid placing Ironic-specific code into
tripleo-common, a new library &lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt; has been developed and accepted into
the Ironic governance.&lt;/p&gt;
&lt;p&gt;As part of the implementation, this blueprint proposes completely separting the
bare metal provisioning process from software configuration, including the CLI
level. This has two benefits:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Having a clear separation between two error-prone processes simplifies
debugging for operators.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reusing the existing &lt;em&gt;deployed-server&lt;/em&gt; workflow simplifies the
implementation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the distant future, the functionality of &lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt; may be moved into
Ironic API itself. In this case it will be phased out, while keeping the same
Mistral workflows.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="operator-workflow"&gt;
&lt;h3&gt;Operator workflow&lt;/h3&gt;
&lt;p&gt;As noted in &lt;a class="reference internal" href="#overview"&gt;Overview&lt;/a&gt;, the CLI/GUI workflow will be split into hardware
provisioning and software configuration parts (the former being optional).&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;In addition to existing Heat templates, a new file
&lt;a class="reference internal" href="#baremetal-deployment-yaml"&gt;baremetal_deployment.yaml&lt;/a&gt; will be populated by an operator with the bare
metal provisioning information.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bare metal deployment will be conducted by a new CLI command or GUI
operation using the new &lt;a class="reference internal" href="#deploy-roles-workflow"&gt;deploy_roles workflow&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="n"&gt;provision&lt;/span&gt; \
   &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="n"&gt;baremetal_environment&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt; &lt;span class="n"&gt;baremetal_deployment&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This command will take the input from &lt;a class="reference internal" href="#baremetal-deployment-yaml"&gt;baremetal_deployment.yaml&lt;/a&gt;, provision
requested bare metal machines and output a Heat environment file
&lt;a class="reference internal" href="#baremetal-environment-yaml"&gt;baremetal_environment.yaml&lt;/a&gt; to use with the &lt;em&gt;deployed-server&lt;/em&gt; feature.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, the regular deployment is done, including the generated file:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;deploy&lt;/span&gt; \
   &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="n"&gt;cli&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; \
   &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="n"&gt;baremetal_environment&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt; \
   &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;usr&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;share&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;heat&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;environments&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;deployed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt; \
   &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;usr&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;share&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;heat&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;environments&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;deployed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bootstrap&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;centos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt; \
   &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;usr&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;share&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;heat&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;deployed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;deployed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;roles&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For simplicity the two commands can be combined:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;deploy&lt;/span&gt; \
   &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="n"&gt;cli&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; \
   &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="n"&gt;baremetal_deployment&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt; \
   &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;usr&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;share&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;heat&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;environments&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;deployed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt; \
   &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;usr&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;share&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;heat&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;environments&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;deployed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bootstrap&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;centos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt; \
   &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;usr&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;share&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;heat&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;templates&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;deployed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;deployed&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;roles&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The new argument &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--baremetal-deployment&lt;/span&gt;&lt;/code&gt;/&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;-b&lt;/span&gt;&lt;/code&gt; will accept the
&lt;a class="reference internal" href="#baremetal-deployment-yaml"&gt;baremetal_deployment.yaml&lt;/a&gt; and do the deployment automatically.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="breakdown-of-the-changes"&gt;
&lt;h3&gt;Breakdown of the changes&lt;/h3&gt;
&lt;p&gt;This section describes the required changes in depth.&lt;/p&gt;
&lt;section id="image-upload"&gt;
&lt;h4&gt;Image upload&lt;/h4&gt;
&lt;p&gt;As Glance will no longer be used, images will have to be served from other
sources. Ironic supports HTTP and file sources from its images. For the
undercloud case, the file source seems to be the most straightforward, also the
&lt;em&gt;Edge&lt;/em&gt; case may require using HTTP images.&lt;/p&gt;
&lt;p&gt;To make both cases possible, the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;image&lt;/span&gt; &lt;span class="pre"&gt;upload&lt;/span&gt;&lt;/code&gt; command
will now copy the three overcloud images (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud-full.qcow2&lt;/span&gt;&lt;/code&gt;,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud-full.kernel&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud-full.ramdisk&lt;/span&gt;&lt;/code&gt;) to
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/var/lib/ironic/httpboot/overcloud-images&lt;/span&gt;&lt;/code&gt;. This will allow referring to
images both via &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;file:///var/lib/ironic/httpboot/overcloud.images/...&lt;/span&gt;&lt;/code&gt; and
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;http(s)://&amp;lt;UNDERCLOUD&lt;/span&gt; &lt;span class="pre"&gt;HOST&amp;gt;:&amp;lt;IPXE&lt;/span&gt; &lt;span class="pre"&gt;PORT&amp;gt;/overcloud-images/...&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Finally, a checksum file will be generated from the copied images using:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ironic&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;httpboot&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;
&lt;span class="n"&gt;md5sum&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;full&lt;/span&gt;&lt;span class="o"&gt;.*&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MD5SUMS&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is required since the checksums will no longer come from Glance.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="baremetal-deployment-yaml"&gt;
&lt;h4&gt;baremetal_deployment.yaml&lt;/h4&gt;
&lt;p&gt;This file will describe which the bare metal provisioning parameters. It will
provide the information that is currently implicitly deduced from the Heat
templates.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;We could continue extracting it from the templates well. However, a separate
file will avoid a dependency on any Heat-specific logic, potentially
benefiting standalone installer cases. It also provides the operators with
more control over the provisioning process.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The format of this file resembles one of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;roles_data&lt;/span&gt;&lt;/code&gt; file. It describes
the deployment parameters for each role. The file contains a list of roles,
each with a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;name&lt;/span&gt;&lt;/code&gt;. Other accepted parameters are:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;count&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;number of machines to deploy for this role. Defaults to 1.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;profile&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;profile (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;compute&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;control&lt;/span&gt;&lt;/code&gt;, etc) to use for this role. Roughly
corresponds to a flavor name for a Nova based deployment. Defaults to no
profile (any node can be picked).&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;hostname_format&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;a template for generating host names. This is similar to
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HostnameFormatDefault&lt;/span&gt;&lt;/code&gt; of a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;roles_data&lt;/span&gt;&lt;/code&gt; file and should use
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;%index%&lt;/span&gt;&lt;/code&gt; to number the nodes. The default is &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;%stackname%-&amp;lt;role&lt;/span&gt; &lt;span class="pre"&gt;name&lt;/span&gt; &lt;span class="pre"&gt;in&lt;/span&gt;
&lt;span class="pre"&gt;lower&lt;/span&gt; &lt;span class="pre"&gt;case&amp;gt;-%index%&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;instances&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;list of instances in the format accepted by &lt;a class="reference internal" href="#deploy-instances-workflow"&gt;deploy_instances workflow&lt;/a&gt;.
This allows to tune parameters per instance.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;section id="examples"&gt;
&lt;h5&gt;Examples&lt;/h5&gt;
&lt;p&gt;Deploy one compute and one control with any profile:&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Controller&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;HA deployment with two computes and profile matching:&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;2&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hostname_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-%index%.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Controller&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;3&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;hostname_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-%index%.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Advanced deployment with custom hostnames and parameters set per instance:&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-05.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;nics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;10.0.2.5&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;traits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;HW_CPU_X86_VMX&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-06.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;nics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;10.0.2.5&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;traits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;HW_CPU_X86_VMX&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Controller&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-1.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;swap_size_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;4096&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-2.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;swap_size_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;4096&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-3.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;swap_size_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;4096&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="deploy-roles-workflow"&gt;
&lt;h4&gt;deploy_roles workflow&lt;/h4&gt;
&lt;p&gt;The workflow &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo.baremetal_deploy.v1.deploy_roles&lt;/span&gt;&lt;/code&gt; will accept the
information from &lt;a class="reference internal" href="#baremetal-deployment-yaml"&gt;baremetal_deployment.yaml&lt;/a&gt;, convert it into the low-level
format accepted by the &lt;a class="reference internal" href="#deploy-instances-workflow"&gt;deploy_instances workflow&lt;/a&gt; and call the
&lt;a class="reference internal" href="#deploy-instances-workflow"&gt;deploy_instances workflow&lt;/a&gt; with it.&lt;/p&gt;
&lt;p&gt;It will accept the following mandatory input:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;roles&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;parsed &lt;a class="reference internal" href="#baremetal-deployment-yaml"&gt;baremetal_deployment.yaml&lt;/a&gt; file.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;It will accept one optional input:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;plan&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;plan/stack name, used for templating. Defaults to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;It will return the same output as the &lt;a class="reference internal" href="#deploy-instances-workflow"&gt;deploy_instances workflow&lt;/a&gt; plus:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;environment&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;the content of the generated &lt;a class="reference internal" href="#baremetal-environment-yaml"&gt;baremetal_environment.yaml&lt;/a&gt; file.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;section id="id1"&gt;
&lt;h5&gt;Examples&lt;/h5&gt;
&lt;p&gt;The examples from &lt;a class="reference internal" href="#baremetal-deployment-yaml"&gt;baremetal_deployment.yaml&lt;/a&gt; will be converted to:&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;overcloud-compute-0&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;overcloud-controller-0&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-0.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-1.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-0.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-1.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-2.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-05.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;nics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;10.0.2.5&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;traits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;HW_CPU_X86_VMX&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-06.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;nics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;network&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;fixed_ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;10.0.2.5&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;traits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;HW_CPU_X86_VMX&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-1.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;swap_size_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;4096&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-2.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;swap_size_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;4096&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-3.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;control&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;swap_size_mb&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;4096&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="deploy-instances-workflow"&gt;
&lt;h4&gt;deploy_instances workflow&lt;/h4&gt;
&lt;p&gt;The workflow &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo.baremetal_deploy.v1.deploy_instances&lt;/span&gt;&lt;/code&gt; is a thin wrapper
around the corresponding &lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt; calls.&lt;/p&gt;
&lt;p&gt;The following inputs are mandatory:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;instances&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;list of requested instances in the format described in &lt;a class="reference internal" href="#instance-format"&gt;Instance format&lt;/a&gt;.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ssh_keys&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;list of SSH public keys contents to put on the machines.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The following inputs are optional:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ssh_user_name&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;SSH user name to create, defaults to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;heat-admin&lt;/span&gt;&lt;/code&gt; for compatibility.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;timeout&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;deployment timeout, defaults to 3600 seconds.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;concurrency&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;deployment concurrency - how many nodes to deploy at the same time. Defaults
to 20, which matches introspection.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;section id="instance-format"&gt;
&lt;h5&gt;Instance format&lt;/h5&gt;
&lt;p&gt;The instance record format closely follows one of the &lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/user/ansible.html#instance"&gt;metalsmith ansible
role&lt;/a&gt; with only a few TripleO-specific additions and defaults changes.&lt;/p&gt;
&lt;p&gt;Either or both of the following fields must be present:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;hostname&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;requested hostname. It is used to identify the deployed instance later on.
Defaults to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;name&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;name&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;name of the node to deploy on. If &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;hostname&lt;/span&gt;&lt;/code&gt; is not provided, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;name&lt;/span&gt;&lt;/code&gt; is
also used as the hostname.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The following fields will be supported:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;capabilities&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;requested node capabilities (except for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;profile&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;boot_option&lt;/span&gt;&lt;/code&gt;).&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;conductor_group&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;requested node’s conductor group. This is primary for the &lt;em&gt;Edge&lt;/em&gt; case when
nodes managed by the same Ironic can be physically separated.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nics&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;list of requested NICs, see &lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt; documentation for details. Defaults
to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;{"network":&lt;/span&gt; &lt;span class="pre"&gt;"ctlplane"}&lt;/span&gt;&lt;/code&gt; which requests creation of a port on the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ctlplane&lt;/span&gt;&lt;/code&gt; network.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;profile&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;profile to use (e.g. &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;compute&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;control&lt;/span&gt;&lt;/code&gt;, etc).&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;resource_class&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;requested node’s resource class, defaults to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;baremetal&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;root_size_gb&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;size of the root partition in GiB, defaults to 49.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;swap_size_mb&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;size of the swap partition in MiB, if needed.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;traits&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;list of requested node traits.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;whole_disk_image&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;boolean, whether to treat the image (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud-full.qcow2&lt;/span&gt;&lt;/code&gt; or provided
through the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;image&lt;/span&gt;&lt;/code&gt; field) as a whole disk image. Defaults to false.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The following fields will be supported, but the defaults should work for all
but the most extreme cases:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;image&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;file or HTTP URL of the root partition or whole disk image.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;image_kernel&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;file or HTTP URL of the kernel image (partition images only).&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;image_ramdisk&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;file or HTTP URL of the ramdisk image (partition images only).&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;image_checksum&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;checksum of URL of checksum of the root partition or whole disk image.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="certificate-authority-configuration"&gt;
&lt;h5&gt;Certificate authority configuration&lt;/h5&gt;
&lt;p&gt;If TLS is used in the undercloud, we need to make the nodes trust
the Certificate Authority (CA) that signed the TLS certificates.
If &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/etc/pki/ca-trust/source/anchors/cm-local-ca.pem&lt;/span&gt;&lt;/code&gt; exists, it will be
included in the generated configdrive, so that the file is copied into the same
location on target systems.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="outputs"&gt;
&lt;h5&gt;Outputs&lt;/h5&gt;
&lt;p&gt;The workflow will provide the following outputs:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ctlplane_ips&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;mapping of host names to their respective IP addresses on the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ctlplane&lt;/span&gt;&lt;/code&gt;
network.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;instances&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;mapping of host names to full instance representations with fields:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;node&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Ironic node representation.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ip_addresses&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;mapping of network names to list of IP addresses on them.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;hostname&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;instance hostname.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;state&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/reference/api/metalsmith.html#metalsmith.Instance.state"&gt;metalsmith instance state&lt;/a&gt;.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;uuid&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Ironic node uuid.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Also two subdicts of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;instances&lt;/span&gt;&lt;/code&gt; are provided:&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;existing_instances&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;only instances that already existed.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;new_instances&lt;/span&gt;&lt;/code&gt;&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;only instances that were deployed.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Instances are distinguised by their hostnames.&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="baremetal-environment-yaml"&gt;
&lt;h4&gt;baremetal_environment.yaml&lt;/h4&gt;
&lt;p&gt;This file will serve as an output of the bare metal provisioning process. It
will be fed into the overcloud deployment command. Its goal is to provide
information for the &lt;em&gt;deployed-server&lt;/em&gt; workflow.&lt;/p&gt;
&lt;p&gt;The file will contain the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HostnameMap&lt;/span&gt;&lt;/code&gt; generated from role names and
hostnames, e.g.&lt;/p&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;parameter_defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;HostnameMap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;overcloud-controller-0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-1.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;overcloud-controller-1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-2.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;overcloud-controller-2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;controller-3.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;overcloud-novacompute-0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-05.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;overcloud-novacompute-1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;compute-06.us-west.example.com&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="undeploy-instances-workflow"&gt;
&lt;h4&gt;undeploy_instances workflow&lt;/h4&gt;
&lt;p&gt;The workflow &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo.baremetal_deploy.v1.undeploy_instances&lt;/span&gt;&lt;/code&gt; will take a
list of hostnames and undeploy the corresponding nodes.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="novajoin-replacement"&gt;
&lt;h3&gt;Novajoin replacement&lt;/h3&gt;
&lt;p&gt;The &lt;em&gt;novajoin&lt;/em&gt; service is currently used to enroll nodes into IPA and provide
them with TLS certificates. Unfortunately, it has hard dependencies on Nova,
Glance and Metadata API, even though the information could be provided via
other means. Actually, the metadata API cannot always be provided with Ironic
(notably, it may not be available when using isolated provisioning networks).&lt;/p&gt;
&lt;p&gt;A potential solution is to provide the required information via a configdrive,
and make the nodes register themselves instead.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Do nothing, continue to rely on Nova and work around cases when it does
match our goals well. See &lt;a class="reference internal" href="#problem-description"&gt;Problem Description&lt;/a&gt; for why it is not desired.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Avoid &lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt;, use OpenStack Ansible modules or Bifrost. They currently
lack features (such as VIF attach/detach API) and do not have any notion of
scheduling. Implementing sophisticated enough scheduling in pure Ansible
seems a serious undertaking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Avoid Mistral, drive &lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt; via Ansible. This is a potential future
direction of this work, but currently it seems much simpler to call
&lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt; Python API from Mistral actions. We would anyway need Mistral (
(or Ansible Tower) to drive Ansible, because we need some API level.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove Neutron in the same change. Would reduce footprint even further, but
some operators may find the presence of an IPAM desirable. Also setting up
static DHCP would increase the scope of the implementation substantially and
complicate the upgrade even further.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keep Glance but remove Nova. Does not make much sense, since Glance is only a
requirement because of Nova. Ironic can deploy from HTTP or local file
locations just as well.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Overcloud images will be exposed to unauthenticated users via HTTP. We need
to communicate it clearly that secrets must not be built into images in plain
text and should be delivered via &lt;em&gt;configdrive&lt;/em&gt; instead. If it proves
a problem, we can limit ourselves to providing images via local files.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This issue exists today, as images are transferred via insecure medium in
all supported deploy methods.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Removing two services from the undercloud will reduce potential attack
surface and simplify audit.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;The initial version of this feature will be enabled for new deployments only.&lt;/p&gt;
&lt;p&gt;The upgrade procedure will happen within a release, not between releases.
It will go roughly as follows:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Upgrade to a release where undercloud without Nova and Glance is supported.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make a full backup of the undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;image&lt;/span&gt; &lt;span class="pre"&gt;upload&lt;/span&gt;&lt;/code&gt; to ensure that the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud-full&lt;/span&gt;&lt;/code&gt; images are available via HTTP(s).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The next steps will probably be automated via an Ansible playbook or a Mistral
workflow:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Mark deployed nodes &lt;em&gt;protected&lt;/em&gt; in Ironic to prevent undeploying them
by mistake.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run a Heat stack update replacing references to Nova servers with references
to deployed servers. This will require telling Heat not to remove the
instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mark nodes as managed by &lt;em&gt;metalsmith&lt;/em&gt; (optional, but simplifies
troubleshooting).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update node’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;instance_info&lt;/span&gt;&lt;/code&gt; to refer to images over HTTP(s).&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This may require temporary moving nodes to maintenance.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run an undercloud update removing Nova and Glance.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Nova CLI will no longer be available for troubleshooting. It should not be a
big problem in reality, as most of the problems it is used for are caused by
using Nova itself.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt; provides a CLI tool for troubleshooting and advanced users. We
will document using it for tasks like determining IP addresses of nodes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It will no longer be possible to update images via Glance API, e.g. from GUI.
It should not be a bit issue, as most of users use pre-built images. Advanced
operators are likely to resort to CLI anyway.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;No valid host found&lt;/em&gt; error will no longer be seen by operators. &lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt;
provides more detailed errors, and is less likely to fail because of its
scheduling approach working better with the undercloud case.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A substantial speed-up is expected for deployments because of removing
several layers of indirection. The new deployment process will also fail
faster if the scheduling request cannot be satisfied.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Providing images via local files will remove the step of downloading them
from Glance, providing even more speed-up for larger images.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An operator will be able to tune concurrency of deployment via CLI arguments
or GUI parameters, other than &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nova.conf&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;New features for bare metal provisioning will have to be developed with this
work in mind. It may mean implementing something in &lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt; code instead of
relying on Nova servers or flavors, or Glance images.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Dmitry Tantsur, IRC: dtantsur, LP: divius&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Phase 1 (Stein, technical preview):&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Update &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;image&lt;/span&gt; &lt;span class="pre"&gt;upload&lt;/span&gt;&lt;/code&gt; to copy images into the HTTP
location and generate checksums.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement &lt;a class="reference internal" href="#deploy-instances-workflow"&gt;deploy_instances workflow&lt;/a&gt; and &lt;a class="reference internal" href="#undeploy-instances-workflow"&gt;undeploy_instances workflow&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update validations to not fail if Nova and/or Glance are not present.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement &lt;a class="reference internal" href="#deploy-roles-workflow"&gt;deploy_roles workflow&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide CLI commands for the created workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide an experimental OVB CI job exercising the new approach.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Phase 2 (T+, fully supported):&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Update &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;overcloud&lt;/span&gt; &lt;span class="pre"&gt;deploy&lt;/span&gt;&lt;/code&gt; to support the new workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support scaling down.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide a &lt;a class="reference internal" href="#novajoin-replacement"&gt;Novajoin replacement&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide an upgrade workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Consider deprecating provisioning with Nova and Glance.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/metalsmith/latest/"&gt;metalsmith&lt;/a&gt; library will be used for easier access to Ironic+Neutron API.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Since testing this feature requires bare metal provisioning, a new OVB job will
be created for it. Initially it will be experimental, and will move to the
check queue before the feature is considered fully supported.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation will have to be reworked to explain the new deployment approach.
Troubleshooting documentation will have to be updated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Mon, 24 Sep 2018 00:00:00 </pubDate></item><item><title>TripleO tools for testing HA deployments</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/tripleo-ha-utils.html</link><description>
 
&lt;p&gt;We need a way to verify a Highly Available TripleO deployment with proper tests
that check if the HA bits are behaving correctly.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently, we test HA behavior of TripleO deployments only by deploying
environments with three controllers and see if we’re able to spawn an instance,
but this is not enough.&lt;/p&gt;
&lt;p&gt;There should be a way to verify the HA capabilities of deployments, and if the
behavior of the environment is still correct after inducted failures,
simulated outages and so on.&lt;/p&gt;
&lt;p&gt;This tool should be a standalone component to be included by the user if
necessary, without breaking any of the dynamics present in TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The proposal is to create an Ansible based project named tripleo-ha-utils that
will be consumable by the various tools that we use to deploy TripleO
environments like tripleo-quickstart or infrared or by manual deployments.&lt;/p&gt;
&lt;p&gt;The project will initially cover three principal roles:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;stonith-config&lt;/strong&gt;: a playbook used to automate the creation of fencing
devices in the overcloud;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;instance-ha&lt;/strong&gt;: a playbook that automates the seventeen manual steps needed
to configure instance HA in the overcloud, test them via rally and verify
that instance HA works appropriately;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;validate-ha&lt;/strong&gt;: a playbook that runs a series of disruptive actions in the
overcloud and verifies it always behaves correctly by deploying a
heat-template that involves all the overcloud components;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Today the project exists outside the TripleO umbrella, and it is named
tripleo-quickstart-utils [1]  (see “Alternatives” for the historical reasons of
this name). It is used internally inside promotion pipelines, and has
also been tested with success in RDOCloud.&lt;/p&gt;
&lt;section id="pluggable-implementation"&gt;
&lt;h4&gt;Pluggable implementation&lt;/h4&gt;
&lt;p&gt;The base principle of the project is to give people the ability to integrate
the first roles with whatever kind of test. For example, today we’re using
a simple bash framework to interact with the cluster (so pcs commands and
other interactions), rally to test instance-ha and Ansible itself to simulate
full power outage scenarios.
The idea is to keep this pluggable approach leaving the final user the choice
about what to use.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="retro-compatibility"&gt;
&lt;h4&gt;Retro compatibility&lt;/h4&gt;
&lt;p&gt;One of the aims of this project is to be retro-compatible with the previous
version of OpenStack. Starting from Liberty, we cover instance-ha and
stonith-config Ansible playbooks for all the releases.
The same happens while testing HA since all the tests are plugged in depending
on the release.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;While evaluating alternatives, the first thing to consider is that this
project aims to be a TripleO-centric set of tools for HA, not a generic
OpenStack’s one.
We want tools to help the user answer questions like “Is the Galera bundle
cluster resource able to tolerate a stop and a consecutive start without
affecting the environment capabilities?” or “Is the environment able to
evacuate instances after being configured for Instance HA?”. And the answer we
want is YES or NO.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;em&gt;tripleo-validations&lt;/em&gt;: the most logical place to put this, at least&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;looking at the name, would be tripleo-validations. By talking with folks
working on it, it came out that the meaning of tripleo-validations project is
not doing disruptive tests. Integrating this stuff would be out of scope.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;em&gt;tripleo-quickstart-extras&lt;/em&gt;: apart from the fact that this is not&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;something meant just for quickstart (the project supports infrared and
“plain” environments as well) even if we initially started there, in the
end, it came out that nobody was looking at the patches since nobody was
able to verify them. The result was a series of reviews stuck forever.
So moving back to extras would be a step backward.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None. The good thing about this solution is that there’s no impact for anyone
unless the solution gets loaded inside an existing project. Since this will be
an external project, it will not impact anything of the current stuff.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None. Unless the deployments, the CI runs or whatever include the roles there
will be no impact, and so the performances will not change.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;rscarazz&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Import the tripleo-quickstart-utils [1] as a new repository and start new
deployments from there.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Due to the disruptive nature of these tests, the TripleO CI should not be
updated to include these tests, mostly because of timing issues.
This project should remain optionally usable by people when needed, or in
specific CI environments meant to support longer than usual jobs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;All the implemented roles are today fully documented in the
tripleo-quickstart-utils [1] project, so importing its repository as is will
also give its full documentation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;[1] Original project to import as new&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/tripleo-quickstart-utils"&gt;https://github.com/redhat-openstack/tripleo-quickstart-utils&lt;/a&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
</description><pubDate>Wed, 12 Sep 2018 00:00:00 </pubDate></item><item><title>In-flight Validations for the overcloud</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/inflight-validations.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/inflight-validations"&gt;https://blueprints.launchpad.net/tripleo/+spec/inflight-validations&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Currently, we don’t have any way to run validations inside a deploy run. This
spec aims to provide the necessary information on how to implement such
in-flight validations for an overcloud deploy.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently, operators and developers have to wait a long time before getting an
error in case a service isn’t running as expected.&lt;/p&gt;
&lt;p&gt;This leads to loss of time and resources.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;After each container/service is started, a new step is added to run one or more
validations on the deployed host in order to ensure the service is actually
working as expected at said step.&lt;/p&gt;
&lt;p&gt;These validations must not use Mistral Workflow, in order to provide support
for the undercloud/standalone case.&lt;/p&gt;
&lt;p&gt;The best way to push those validations would be through the already existing
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;deploy_steps_tasks&lt;/span&gt;&lt;/code&gt; keywork. A validation should be either at the start
of the next step, or at the end of the current step we want to check.&lt;/p&gt;
&lt;p&gt;The validations should point to an external playbook, for instance hosted in
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-validations&lt;/span&gt;&lt;/code&gt;. If there isn’t real use to create a playbook for the
validation, it might be inline - but it must be short, for example a single test
for an open port.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;There isn’t really other alternative. We might think running the validation
ansible playbook directly is a good idea, but it will break the wanted
convergence with the UI.&lt;/p&gt;
&lt;p&gt;For now, there isn’t such validations, we can start fresh.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;No security impact.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;If a service isn’t starting properly, the upgrade might fail. This is also true
for a fresh deploy.&lt;/p&gt;
&lt;p&gt;We might want different validation tasks/workflows if we’re in an upgrade
state.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;End user will get early failure in case of issues detected by the validations.
This is an improvement, as for now it might fail at a later step, and might
break things due to the lack of valid state.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Running in-flight validation WILL slow the overall deploy/upgrade process, but
on the other hand, it will ensure we have a clean state before each step.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;No other deployer impact.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Validations will need to be created and documented in order to get proper runs.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Who is leading the writing of the code? Or is this a blueprint where you’re
throwing it out there to see who picks it up?&lt;/p&gt;
&lt;p&gt;If more than one person is working on the implementation, please designate the
primary author and contact.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;cjeanner&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&amp;lt;launchpad-id or None&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add new hook for the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;validation_tasks&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide proper documentation on its use&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Please keep in mind the Validation Framework spec when implementing things:
&lt;a class="reference external" href="https://review.openstack.org/589169"&gt;https://review.openstack.org/589169&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;TBD&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;What is the impact on the docs? Don’t repeat details discussed above, but
please reference them here.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/589169"&gt;https://review.openstack.org/589169&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Wed, 12 Sep 2018 00:00:00 </pubDate></item><item><title>TripleO Zero Footprint Installer</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/zero-footprint-installer.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/zero-footprint"&gt;https://blueprints.launchpad.net/tripleo/+spec/zero-footprint&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec introduces support for an installer mode which has zero
(or at least much less) dependencies than we do today. It is meant
to be an iteration of the Undercloud and All-In-One (standalone)
installers that allows you to end up with the same result without
having to install all of the TripleO dependencies on your host machine.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Installing python-tripleoclient on a host machine currently installs
a lot of dependencies many of which may be optional for smaller
standalone type installations. Users of smaller standalone installations
can have a hard time understanding the differences between what TripleO
dependencies get installed vs which services TripleO installs.&lt;/p&gt;
&lt;p&gt;Additionally, some developers would like a fast-track way to develop and
run playbooks without requiring local installation of an Undercloud which
in many cases is done inside a virtual machine to encapsulate the dependencies
that get installed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;A new zero footprint installer can help drive OpenStack Tripleoclient
commands running within a container. Using this approach you can:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Generate Ansible playbooks from a set of Heat templates
(tripleo-heat-templates), Heat environments, and Heat parameters
exactly like we do today using a Container. No local dependencies
would be required to generate the playbooks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(optionally) Execute the playbooks locally on the host machine. This would
require some Ansible modules to be installed that TripleO depends on but
is a much smaller footprint than what we require elsewhere today.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Create a subpackage of python-tripleoclient which installs less dependencies.
The general footprint of required packages would still be quite high (lots
of OpenStack packages will still be installed for the client tooling).&lt;/p&gt;
&lt;p&gt;Or do nothing and continue to use VMs to encapsulate the dependencies for
an Undercloud/All-In-One installer and generate Ansible playbooks. Setting
up a local VM requires more initial setup and dependencies however and is
heavier than just using a local container to generate the same playbooks.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;As a container will be used to generate Ansible playbooks the user may
need to expose some local data/files to the installer container. This is
likely a minimal concern as we already require this data to be exposed to
the Undercloud and All-In-One installers.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Faster deployment and testing of local All-On-One setups.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Faster deployment and testing of local All-On-One setups.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;dprince&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A new ‘tripleoclient’ container&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New project to drive the installation (Talon?)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Continue to work on refining the Ansible playbook modules to provide a
cleaner set of playbook dependencies. Specifically those that depend on
the any of the traditional TripleO/Heat agent hooks and scripts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;documentation updates&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;This new installer can likely suppliment or replace some of the testing we
are doing for All-In-One (standalone) deployments in upstream CI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Docs will need to be updated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Thu, 06 Sep 2018 00:00:00 </pubDate></item><item><title>Improve upgrade_tasks CI coverage with the standalone installer</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/stein/all-in-one-upgrades-jobs.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/upgrades-ci-standalone"&gt;https://blueprints.launchpad.net/tripleo/+spec/upgrades-ci-standalone&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The main goal of this work is to improve coverage of service upgrade_tasks in
tripleo ci upgrades jobs, by making use of the &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2018-June/131135.html"&gt;Standalone_installer_work&lt;/a&gt;.
Using a standalone node as a single node ‘overcloud’ allows us to exercise
both controlplane and dataplane services in the same job and within current
resources of 2 nodes and 3 hours. Furthermore and once proven successful
this approach can be extended to include even single service upgrades testing
to vastly improve on the current coverage with respect to all the service
upgrade_tasks defined in the tripleo-heat-templates (which is currently minimal).&lt;/p&gt;
&lt;p&gt;Traditionally upgrades jobs have been restricted by resource constraints
(nodes and walltime). For example the undercloud and overcloud upgrade are
never exercised in the same job, that is an overcloud upgrade job uses an undercloud that is already on the target version (so called mixed version deployment).&lt;/p&gt;
&lt;p&gt;A further example is that upgrades jobs have typically exercised either
controlplane or dataplane upgrades (i.e. controllers only, or compute only)
and never both in the same job, again because constraints. The currently running
&lt;a class="reference external" href="https://github.com/openstack-infra/tripleo-ci/blob/4101a393f29c18a84f64cd95a28c41c8142c5b05/zuul.d/multinode-jobs.yaml#L384"&gt;tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades&lt;/a&gt; job for
example has 2 nodes, where one is undercloud and one is overcloud controller.
The workflow &lt;em&gt;is&lt;/em&gt; being exercised, but controller only. Furthermore, whilst
the &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/9f1d855627cf54d26ee540a18fc8898aaccdda51/ci/environments/scenario000-multinode-containers.yaml#L21"&gt;current_upgrade_ci_scenario&lt;/a&gt; is only exercising a small subset of the
controlplane services, it is still running at well over 140 minutes. So there
is also very little coverage with respect to the upgrades_tasks across the
many different service templates defined in the tripleo-heat-templates.&lt;/p&gt;
&lt;p&gt;Thus the main goal of this work is to use the standalone installer to define
ci jobs that test the service upgrade_tasks for a one node ‘overcloud’ with
both controlplane and dataplane services. This approach is composable as the
services in the stand-alone are fully configurable. Thus after the first
iteration of compute/control, we can also define per-service ci jobs and over
time hopefully reach coverage for all the services deployable by TripleO.&lt;/p&gt;
&lt;p&gt;Finally it is worth emphasising that the jobs defined as part of this work will not
be testing the TripleO upgrades &lt;em&gt;workflow&lt;/em&gt; at all. Rather this is about testing
the service upgrades_tasks specifically. The workflow instead will be tested
using the existing ci upgrades job (&lt;a class="reference external" href="https://github.com/openstack-infra/tripleo-ci/blob/4101a393f29c18a84f64cd95a28c41c8142c5b05/zuul.d/multinode-jobs.yaml#L384"&gt;tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades&lt;/a&gt;) subject to modifications to strip it down to a bare
minimum required (e.g. hardly any services). There are more pointers to this
from the discussion at the &lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-ptg-stein"&gt;TripleO-Stein-PTG&lt;/a&gt; but ultimately we will have two
approximations of the upgrade tested in ci - the service upgrade_tasks as
described by this spec, and the workflow itself using a different ci job or
modifying the existing one.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;As described above we have not been able to have control and dataplane
services upgraded as part of the same tripleo ci job. Such a job would
have to be 3 nodes for starters (undercloud,controller,compute).&lt;/p&gt;
&lt;p&gt;A &lt;em&gt;full&lt;/em&gt; upgrade workflow would need the following steps:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;deploy undercloud, deploy overcloud&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;upgrade undercloud&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;upgrade prepare the overcloud (heat stack update generates playbooks)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;upgrade run controllers (ansible-playbook via mistral workflow)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;upgrade run computes/storage etc (repeat until all done)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;upgrade converge (heat stack update).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The problem being solved here is that we can run only some approximation of
the upgrade workflow, specifically the upgrade_tasks, for a composed set
of services and do so within the ci timeout. The first iteration will focus on
modelling a one node ‘overcloud’ with both controller and compute services. If
we prove this to be successful we can also consider single-service upgrades
jobs (a job for testing just nova,or glance upgrade tasks for example) for
each of services that we want to test the upgrades tasks. Thus even though
this is just an approximation of the upgrade (upgrade_tasks only, not the full
workflow), it can hopefully allow for a wider coverage of services in ci
than is presently possible.&lt;/p&gt;
&lt;p&gt;One of the early considerations when writing this spec was how we could enforce
a separation of services with respect to the upgrade workflow. That is, enforce
that controlplane upgrade_tasks and deploy_steps are executed first and then
dataplane compute/storage/ceph as is usually the case with the upgrade workflow.
However review comments on this spec as well as PTG discussions around it, in
particular that this is just some approximation of the upgrade (service
upgrade tasks, not workflow) in which case it may not be necessary to artificially
induce this control/dataplane separation here. This may need to be revisited
once implementation begins.&lt;/p&gt;
&lt;p&gt;Another core challenge that needs solving is how to collect ansible playbooks
from the tripleo-heat-templates since we don’t have a traditional undercloud
heat stack to query. This will hopefully be a lesser challenge assuming we can
re-use the transient heat process used to deploy the standalone node. Futhermore
discussion around this point at the &lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-ptg-stein"&gt;TripleO-Stein-PTG&lt;/a&gt; has informed us of a way
to keep the heat stack after deployment with &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient/blob/a57531382535e92e2bfd417cee4b10ac0443dfc8/tripleoclient/v1/tripleo_deploy.py#L911"&gt;keep-running&lt;/a&gt; so we could just
re-use it as we would with a ‘normal’ deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;We will need to define a new ci job in the &lt;a class="reference external" href="https://github.com/openstack-infra/tripleo-ci/blob/4101a393f29c18a84f64cd95a28c41c8142c5b05/zuul.d/standalone-jobs.yaml"&gt;tripleo-ci_zuul.d_standalone-jobs&lt;/a&gt;
(preferably following the currently ongoing &lt;a class="reference external" href="https://review.openstack.org/#/c/578432/8"&gt;ci_v3_migrations&lt;/a&gt; define this as
v3 job).&lt;/p&gt;
&lt;p&gt;For the generation of the playbooks themselves we hope to use the ephemeral
heat service that is used to deploy the stand-alone node, or use the &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient/blob/a57531382535e92e2bfd417cee4b10ac0443dfc8/tripleoclient/v1/tripleo_deploy.py#L911"&gt;keep-running&lt;/a&gt;
option to the stand-alone deployment to keep the stack around after deployment.&lt;/p&gt;
&lt;p&gt;As described in the problem statement we hope to avoid the task of having to
distinguish between control and dataplane services in order to enforce that
controlplane services are upgraded first.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Add another node and have 3 node upgrades jobs together with increasing the
walltime but this is not scalable in the long term assuming limited
resources!&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;More coverage of services should mean less breakage because of upgrades
incompatible things being merged.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Might be easier for developers too who may have limited access to resources
to take the reproducer script with the standalone jobs and get a dev env for
testing upgrades.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;tripleo-ci and upgrades squads&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;First we must solve the problem of generating the ansible playbooks, that
will include all the latest configuration from the tripleo-heat-templates at
the time of upgrade (including all upgrade_tasks etc) when there is no
undercloud Heat stack to query.&lt;/p&gt;
&lt;p&gt;We might consider some non-heat solution by parsing the tripleo-heat-templates
but I don’t think that is a feasible solution (re-inventing wheels). There is
ongoing work to transfer tasks to roles which is promising and that is another
area to explore.&lt;/p&gt;
&lt;p&gt;One obvious mechanism to explore given the current tools is to re-use the
same ephemeral heat process that the stand-alone uses in deploying the
overcloud, but setting the usual ‘upgrade-init’ environment files for a short
stack ‘update’. This is not tested at all yet so needs to be investigated
further. As identified earlier there is now in fact a &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient/blob/a57531382535e92e2bfd417cee4b10ac0443dfc8/tripleoclient/v1/tripleo_deploy.py#L911"&gt;keep-running&lt;/a&gt; option to the
tripleoclient that will keep this heat process around&lt;/p&gt;
&lt;p&gt;For the first iteration of this work we will aim to use the minimum possible combination
of services to implement a ‘compute’/’control’ overcloud. That is, using the existing
services from the current &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/9f1d855627cf54d26ee540a18fc8898aaccdda51/ci/environments/scenario000-multinode-containers.yaml#L21"&gt;current_upgrade_ci_scenario&lt;/a&gt; with the addition of nova-compute
and any dependencies.&lt;/p&gt;
&lt;p&gt;Finally a third major consideration is how to execute this service upgrade, that
is how to invoke the playbook generation and then run the resulting playbooks
(it probably doesn’t need to converge if we are just interested in the upgrades
tasks). One consideration might be to re-use the existing python-tripleoclient
“openstack overcloud upgrade” prepare and run sub-commands. However the first
and currently favored approach will be to use the existing stand-alone client
commands (&lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient/blob/6b0f54c07ae8d0dd372f16684c863efa064079da/tripleoclient/v1/tripleo_upgrade.py#L33"&gt;tripleo_upgrade&lt;/a&gt; &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient/blob/6b0f54c07ae8d0dd372f16684c863efa064079da/tripleoclient/v1/tripleo_deploy.py#L80"&gt;tripleo_deploy&lt;/a&gt;). So one work item is to try these
and discover any modifications we might need to make them work for us.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Items:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Work out/confirm generation the playbooks for the standalone upgrade tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Work out any needed changes in the client/tools to execute the ansible playbooks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define new ci job in the &lt;a class="reference external" href="https://github.com/openstack-infra/tripleo-ci/blob/4101a393f29c18a84f64cd95a28c41c8142c5b05/zuul.d/standalone-jobs.yaml"&gt;tripleo-ci_zuul.d_standalone-jobs&lt;/a&gt; with control and
compute services, that will exercise upgrade_tasks, deployment_tasks and
post_upgrade_tasks playbooks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Once this first iteration is complete we can then consider defining multiple
jobs for small subsets of services, or even for single services.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;This obviously depends on stand-alone installer&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;There will be at least one new job defined here&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Tue, 03 Jul 2018 00:00:00 </pubDate></item><item><title>Support Barometer(Software Fastpath Service Quality Metrics) Service</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/tripleo-barometer-integration.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-barometer-integration"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-barometer-integration&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The scope of the &lt;a class="reference internal" href="#barometer" id="id1"&gt;&lt;span&gt;[Barometer]&lt;/span&gt;&lt;/a&gt; project is to provide interfaces to support
monitoring of the NFVI. The project has plugins for telemetry frameworks
to enable the collection of platform stats and events and relay gathered
information to fault management applications or the VIM. The scope is
limited to collecting/gathering the events and stats and relaying them
to a relevant endpoint.&lt;/p&gt;
&lt;p&gt;The consumption of performance and traffic-related information/events
provided by this project should be a logical extension of any existing
VNF/NFVI monitoring framework.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Integration of Barometer in TripleO is a benefit for building the OPNFV platform.
The Barometer project is complementary to the Doctor project to build the fault
management framework with &lt;a class="reference internal" href="#apex-installer" id="id2"&gt;&lt;span&gt;[Apex_Installer]&lt;/span&gt;&lt;/a&gt; installer which is an OPNFV installation and
deployment tool based on TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;This spec proposes changes to automate the deployment of Barometer using TripleO.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add puppet-barometer package to the overcloud-full image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define Barometer Service in THT.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add how and when to deploy Barometer in puppet-tripleo.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Barometer service is default disabled in a Deployment. Need to enable it
if deployer wants to use it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Akhila Kishore &amp;lt;&lt;a class="reference external" href="mailto:akhila.kishore%40intel.com"&gt;akhila&lt;span&gt;.&lt;/span&gt;kishore&lt;span&gt;@&lt;/span&gt;intel&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;As outlined in the proposed changes.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;The Barometer RPM package must be in RDO repo.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Add the test for CI scenarios.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The setup and configuration of the Barometer service should be documented.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;div role="list" class="citation-list"&gt;
&lt;div class="citation" id="barometer" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;Barometer&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://wiki.opnfv.org/display/fastpath/Barometer+Home"&gt;https://wiki.opnfv.org/display/fastpath/Barometer+Home&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="apex-installer" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;Apex_Installer&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://wiki.opnfv.org/display/apex"&gt;https://wiki.opnfv.org/display/apex&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Wed, 25 Apr 2018 00:00:00 </pubDate></item><item><title>Support Vitrage(Root Cause Analysis, RCA) Service</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/tripleo-vitrage-integration.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-vitrage-integration"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-vitrage-integration&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference internal" href="#vitrage" id="id1"&gt;&lt;span&gt;[Vitrage]&lt;/span&gt;&lt;/a&gt; is the official OpenStack RCA project. It can perfectly organizes,
analyzes and visualizes the holistic view of the Cloud.&lt;/p&gt;
&lt;p&gt;Vitrage provides functions as follows:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A clear view of the Cloud Topology&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deduced alarms and states&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RCA for alarms/events&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Via Vitrage, the end users can understand what happened in a complex cloud
environment, get the root cause of problems and then resolve issues in time.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently the installation and configuration of Vitrage in openstack is done
manually or using devstack. It shall be automated via tripleo.&lt;/p&gt;
&lt;p&gt;Integration Vitrage in TripleO is benefit for building the OPNFV platform.
It helps the OPNFV &lt;a class="reference internal" href="#doctor" id="id2"&gt;&lt;span&gt;[Doctor]&lt;/span&gt;&lt;/a&gt; project using Vitrage as inspector component to
build the fault management framework with &lt;a class="reference internal" href="#apex" id="id3"&gt;&lt;span&gt;[Apex]&lt;/span&gt;&lt;/a&gt; installer which is an OPNFV
installation and deployment tool based on TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;This spec proposes changes to automate the deployment of Vitrage using TripleO.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add puppet-vitrage package to overcloud-full image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define Vitrage Service in THT.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add how and when to deploy Vitrage in puppet-tripleo.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Vitrage service is default disabled in a Deployment. Need to enable it
if deployer want to use it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;dong wenjuan &amp;lt;&lt;a class="reference external" href="mailto:dong.wenjuan%40zte.com.cn"&gt;dong&lt;span&gt;.&lt;/span&gt;wenjuan&lt;span&gt;@&lt;/span&gt;zte&lt;span&gt;.&lt;/span&gt;com&lt;span&gt;.&lt;/span&gt;cn&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;As outlined in the proposed changes.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;The Vitrage RPM package must be in RDO repo.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Add the test for CI scenarios.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The setup and configuration of the Vitrage server should be documented.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;div role="list" class="citation-list"&gt;
&lt;div class="citation" id="vitrage" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;Vitrage&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://wiki.openstack.org/wiki/Vitrage"&gt;https://wiki.openstack.org/wiki/Vitrage&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="apex" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id3"&gt;Apex&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://wiki.opnfv.org/display/apex"&gt;https://wiki.opnfv.org/display/apex&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="citation" id="doctor" role="doc-biblioentry"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;Doctor&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://wiki.opnfv.org/display/doctor"&gt;https://wiki.opnfv.org/display/doctor&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Tue, 13 Mar 2018 00:00:00 </pubDate></item><item><title>UI Automation Testing</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/ui-automation-testing.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/automated-ui-testing"&gt;https://blueprints.launchpad.net/tripleo/+spec/automated-ui-testing&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We would like to introduce a suite of automated integration tests for the
TripleO UI.  This will prevent regressions, and will lead to more stable
software.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;At the moment, upstream CI only tests for lint errors, and runs our unit tests.
We’d like to add more integration tests for tripleo-ui to the CI pipeline.  This
will include a selenium-based approach.  This allows us to simulate a browser by
using a headless browser when running in CI, and we can detect a lot more
problems than we ever could with just unit testing.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;We would like write a Tempest plugin for tripleo-ui which uses Selenium to drive
a headless browser to execute the tests.  We chose Tempest because it’s a
standard in OpenStack, and gives us nice error reporting.&lt;/p&gt;
&lt;p&gt;We already have the &lt;a class="reference external" href="https://github.com/openstack/tempest-tripleo-ui"&gt;tempest-tripleo-ui&lt;/a&gt; project set up.&lt;/p&gt;
&lt;p&gt;We plan to write a CI job to run our code in Tempest.  In the initial
implementation, this will only cover checking for presence of certain UI
elements, and no deployments will actually be run.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The alternative is that we do all of our testing manually, waste time, have
lower velocity, and have more bugs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The security impact of this is minimal as it’s CI-specific, and not user-facing.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;End users won’t interact with this feature.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;This feature will only consume CI resources.  There should be no negative
resource impact on the End User.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Our goal is to produce software that is more stable.  But we’re not changing any
features, per se.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers will gain a higher degree of confidence in their software.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;hpokorny&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;ukalifon
akrivoka&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Write Selenium tests&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write Tempest plugin code to run Selenium tests&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write a new openstack-infra job to run the Tempest plugin on &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;check&lt;/span&gt;&lt;/code&gt; and
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;gate&lt;/span&gt;&lt;/code&gt;.  At first, this will be a simple sanity job to make sure that the UI
has been rendered.  The CI job won’t run a deployment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Tempest&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Selenium&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;This is a bit meta.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will document how a developer who is new to the tripleo-ui project can get
started with writing new integration tests.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;openstack-dev mailing list discussion:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2017-June/119185.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2017-June/119185.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2017-July/119261.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2017-July/119261.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Wed, 21 Feb 2018 00:00:00 </pubDate></item><item><title>TripleO - Ansible upgrade Worklow with UI integration</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/queens/tripleo_ansible_upgrades_workflow.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/major-upgrade-workflow"&gt;https://blueprints.launchpad.net/tripleo/+spec/major-upgrade-workflow&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;During the Pike cycle the minor update and some parts of the major upgrade
are significantly different to any previous cycle, in that they are &lt;em&gt;not&lt;/em&gt; being
delivered onto nodes via Heat stack update. Rather, Heat stack update is used
to only collect, but not execute, the relevant ansible tasks defined in each
of the service &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/tree/master/docker/services"&gt;manifests&lt;/a&gt; as &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/211d7f32dc9cda261e96c3f5e0e1e12fc0afdbb5/docker/services/nova-compute.yaml#L147"&gt;upgrade_tasks&lt;/a&gt; or &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/60f3f10442f3b4cedb40def22cf7b6938a39b391/puppet/services/tripleo-packages.yaml#L59"&gt;update_tasks&lt;/a&gt; accordingly.
These tasks are then written as stand-alone ansible playbooks in the stack
&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/3dcc9b30e9991087b9e898e25685985df6f94361/common/deploy-steps.j2#L324-L372"&gt;outputs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;These ‘config’ playbooks are then downloaded using the &lt;em&gt;openstack overcloud
config download&lt;/em&gt; &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient/blob/27bba766daa737a56a8d884c47cca1c003f16e3f/tripleoclient/v1/overcloud_config.py#L26-L154"&gt;utility&lt;/a&gt; and finally executed to deliver the actual
upgrade or update. See bugs &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1715557"&gt;1715557&lt;/a&gt; and &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1708115"&gt;1708115&lt;/a&gt; for more information
(or pointers/reviews) about this mechanism as used during the P cycle.&lt;/p&gt;
&lt;p&gt;For Queens and as discussed at the Denver &lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-ptg-queens-upgrades"&gt;PTG&lt;/a&gt; we aim to extend this approach
to include the controlplane upgrade too. That is, instead of using HEAT
SoftwareConfig and &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/f4730632a51dec2b9be6867d58184fac3b8a11a5/common/major_upgrade_steps.j2.yaml#L132-L173"&gt;Deployments&lt;/a&gt;  to &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/f4730632a51dec2b9be6867d58184fac3b8a11a5/puppet/upgrade_config.yaml#L21-L50"&gt;invoke&lt;/a&gt; ansible we should instead collect
the upgrade_tasks for the controlplane nodes into ansible playbooks that can
then be invoked to deliver the actual upgrade.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Whilst it has continually improved in each cycle, complexity and difficulty to
debug or understand what has been executed at any given point of the upgrade
is still one of the biggest complaints from operators about the TripleO
upgrades workflow. In the P cycle and as discussed above, the minor version
update and some part of the ‘non-controller’ upgrade have already moved to the
model being proposed here, i.e. generate ansible-playbooks with an initial heat
stack update and then execute them.&lt;/p&gt;
&lt;p&gt;If we are to use this approach for all parts of the upgrade, including for the
controlplane services then we will also need a mistral workbook that can handle
the download and execution of the ansible-playbook invocations. With this kind
of ansible driven workflow, executed by mistral action/workbook, we can for
the first time consider integration with the UI for upgrade/updates. This
aligns well with the &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2017-September/122089.html"&gt;effort&lt;/a&gt; by the UI team for feature parity in CLI/UI for
Queens. It should also be noted that there is already some work underway to
adding the required mistral actions, at least for the minor update for Pike
deployments in changes &lt;a class="reference external" href="https://review.openstack.org/#/c/487488/"&gt;487488&lt;/a&gt; and &lt;a class="reference external" href="https://review.openstack.org/#/c/487496/"&gt;487496&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Implementing a fully ansible-playbook delivered workflow for the entire major
upgrade workflow will offer a number of benefits:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;very short initial heat stack update to generate the playbooks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;easier to follow and understand what is happening at a given step of the upgrade&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;easier to debug and re-run any particular step of the upgrade&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;implies full python-tripleoclient and mistral workbook support for the
ansible-playbook invocations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;can consider integrating upgrades/updates into the UI, for the first time&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;We will need an initial heat stack update to populate the
upgrade_tasks_playbook into the overcloud stack output (the cli is just a
suggestion):&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;openstack overcloud upgrade –init –init-commands [ “sudo curl -L -o /etc/yum.repos.d/delorean-pike.repo &lt;a class="reference external" href="https://trunk.rdoproject.org/centos7-ocata/current/pike.repo"&gt;https://trunk.rdoproject.org/centos7-ocata/current/pike.repo&lt;/a&gt;”,&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;“sudo yum install my_package”, … ]&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The first step of the upgrade will be used to deliver any required common
upgrade initialisation, such as switching repos to the target version,
installing any new packages required during the upgrade, and populating the upgrades playbooks.&lt;/p&gt;
&lt;p&gt;Then the operator will run the upgrade targeting specific nodes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;openstack overcloud upgrade –nodes [overcloud-novacompute-0, overcloud-novacompute-1] or
openstack overcloud upgrade –nodes “Compute”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Download and execute the ansible playbooks on particular specified set of
nodes. Ideally we will make it possible to specify a role name with the
playbooks being invoked in a rolling fashion on each node.&lt;/p&gt;
&lt;p&gt;One of the required changes is to convert all the service templates to have
‘when’ conditionals instead of the current ‘stepN’. For Pike we did this in
the &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient/blob/4d342826d6c3db38ee88dccc92363b655b1161a5/tripleoclient/v1/overcloud_config.py#L63"&gt;client&lt;/a&gt; to avoid breaking the heat driven upgrade workflow still in use
for the controlplane during the Ocata to Pike upgrade. This will allow us to
use the ‘ansible-native’ &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/fe2acfc579295965b5f39c5ef7a34bea35f3d6bf/common/deploy-steps.j2#L364-L365"&gt;loop&lt;/a&gt; control we are currently using in the generated
ansible playbooks.&lt;/p&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;There will be significant changes to the workflow and cli the operator uses
for the major upgrade as documented above.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The initial Heat stack update will not deliver any of the puppet or docker
config to nodes since the DeploymentSteps will be &lt;a class="reference external" href="https://review.openstack.org/#/c/487496/21/tripleo_common/actions/package_update.py@63"&gt;disabled&lt;/a&gt; as we currently
do for Pike minor update. This will mean a much shorter heat stack update -
exact numbers TBD but ‘minutes not hours’.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Should make it easier for developers to debug particular parts of the upgrades
workflow.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Contributors:
Marios Andreou (marios)
Mathieu Bultel (matbu)
Sofer Athlang Guyot (chem)
Steve Hardy (shardy)
Carlos Ccamacho (ccamacho)
Jose Luis Franco Arza (jfrancoa)
Marius Cornea (mcornea)
Yurii Prokulevych (yprokule)
Lukas Bezdicka (social)
Raviv Bar-Tal (rbartal)
Amit Ugol (amitu)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Remove steps and add when for all the ansible upgrade tasks, minor
update tasks, deployment steps, post_upgrade_tasks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Need mistral workflows that can invoke the required stages of the
workflow (–init and –nodes). There is some existing work in this
direction in &lt;a class="reference external" href="https://review.openstack.org/#/c/463765/"&gt;463765&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CLI/python-tripleoclient changes required. Related to the previous
item there is some work started on this in &lt;a class="reference external" href="https://review.openstack.org/#/c/463728/"&gt;463728&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;UI work - we will need to collaborate with the UI team for the
integration. We have never had UI driven upgrade or updates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CI: Implement a simple job (one nodes, just controller, which does the
heat-setup-output and run ansible –nodes Controller) with keystone
only upgrade. Then iterate on this as we can add service upgrade_tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Docs!&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We will aim to land a ‘keystone-only’ job asap which will be updated as the various
changes required to deliver this spec are closer to merging. For example we
may deploy only a very small subset of services (e.g. first keystone) and then iterate as changes
related to this spec are proposed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We should also track changes in the documented upgrades workflow since as
described here it is going to change significantly both internally as well as
the interface exposed to an operator.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Check the &lt;a class="reference external" href="https://raw.githubusercontent.com/openstack/tripleo-specs/master/specs/queens/tripleo_ansible_upgrades_workflow.rst"&gt;source&lt;/a&gt; for links&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Thu, 08 Feb 2018 00:00:00 </pubDate></item><item><title>Enable logging to stdout/journald for rsyslog</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/logging-stdout.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog"&gt;https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can optimize the current logging implementation to take advantage
of metadata that our default logging driver (journald) adds to the
logs.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently, we put all the logs of the containers into a directory in
the host (/var/log/containers/). While this approach works, it relies
on mounting directories from the host itself. This makes it harder for
logging forwarders, since we need to configure them to track all those
files. With every service that we add, we end up having to write
configuration for that service for those specific files.&lt;/p&gt;
&lt;p&gt;Furthermore, we lose important metadata with this approach. We can
figure out what service wrote what log, but we lose the container name and ID,
which is very useful. These we can easily get just by using the default
docker logging mechanism.&lt;/p&gt;
&lt;p&gt;Instead of relying on the host filesystem for our logs, we can adopt a
simpler solution that both preserves important metadata that is
discarded by the current implementation and that will support most
services without requiring per-service configuration.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The proposal is to configure containerized services to log to
stdout/stderr as is common practice for containerized applications.
This allows the logs to get picked up by the docker logging driver,
and thus we can use “docker logs” to view the logs of a service as one
would usually expect. It will also help us decouple the
containers from the host, since we will no longer be relying on host
filesystem mounts for log collection.&lt;/p&gt;
&lt;p&gt;In the case of services where it’s difficult or not possible to log to
stdout or stderr, we will place log files in a docker volume, and this
volume will be shared with a sidecar container that will output the
logs to stdout so they are consumable by the logging drvier. This will
also apply for containers that log only to syslog (such as HAProxy).
We will stop mounting &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/dev/log&lt;/span&gt;&lt;/code&gt; from the host, and instead add a
sidecar container that will output the logs instead.&lt;/p&gt;
&lt;p&gt;Additionally, since our default logging driver is journald, we will
get all the container logs accessible via &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;journalctl&lt;/span&gt;&lt;/code&gt; and the
journald libraries. So one would be able to do &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;journalctl&lt;/span&gt;
&lt;span class="pre"&gt;CONTAINER_NAME=&amp;lt;container&lt;/span&gt; &lt;span class="pre"&gt;name&amp;gt;&lt;/span&gt;&lt;/code&gt; to get the logs of a specific
container on the node. Furthermore, we would get extra metadata
information for each log entry [1]. We would benefit for
getting the container name (as the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;CONTAINER_NAME&lt;/span&gt;&lt;/code&gt; metadata item)
and the container ID (as the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;CONTAINER_ID&lt;/span&gt;&lt;/code&gt; and
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;CONTAINER_ID_FULL&lt;/span&gt;&lt;/code&gt; metadata items) from each journald log entry
without requiring extra processing.  Adding extra tags to the
containers is possible [2], and would get reflected via the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;CONTAINER_TAG&lt;/span&gt;&lt;/code&gt; metadata entry. These tags can optionally describe the
application that emitted the logs or describe the platform that it
comes from.&lt;/p&gt;
&lt;p&gt;This will also make it easier for us to forward logs, since there will
be a centralized service (journald) on each host from which we can
collect the logs.  When we add a new service, it will be a matter of
following the same logging pattern, and we will automatically be able
to forward those logs without requiring specific configuration to
track a new set of log files.&lt;/p&gt;
&lt;p&gt;With this solution in place, we need to also provide tooling to
integrate with centralized logging solutions. This will then cover
integration to the Openshift Logging Stack [3] and ViaQ [4]. We are
proposing the use of rsyslog for message collection, manipulation, and
log forwarding.  This will also be done in a containerized fashion,
where rsyslog will be a “system container” that reads from the host
journal. Rsyslog will perform metadata extraction from log messages
(such as extracting the user, project, and domain from standard oslo
format logs), and will then finally forward the logs to a central
collector.&lt;/p&gt;
&lt;section id="pluggable-implementation"&gt;
&lt;h4&gt;Pluggable implementation&lt;/h4&gt;
&lt;p&gt;The implementation needs to be done in a pluggable manner. This is because
end-users have already created automation based on the assumption that logs
exist in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/var/log/&amp;lt;service&amp;gt;&lt;/span&gt;&lt;/code&gt; / &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;/var/log/containers/*&lt;/span&gt;&lt;/code&gt; directories
that we have been providing. For this reason, logging to stdout/stderr will be
optional, and we’ll keep logging to files in the host as a default for now.
This will then be optionally enabled via an environment file.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="example"&gt;
&lt;h4&gt;Example&lt;/h4&gt;
&lt;p&gt;nova-api container:&lt;/p&gt;
&lt;p&gt;In the proposed solution, the standard nova logs will go to the
nova_api container’s stdout/stderr. However, since we are also
interested in the apache access logs, we will then create a docker
volume where the access logs will be hosted. A sidecar container will
mount this volume, create a FIFO (named pipe) and output whatever it
gets from that file. Note that this sidecar container will need to be
started before the actual nova_api container.&lt;/p&gt;
&lt;p&gt;For each log file generated in the main container, we will create a
sidecar container that outputs that log.  This will make it easier to
associate log messages with the originating service.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Keep logging to files in the hosts’ directory.&lt;/p&gt;
&lt;p&gt;We can still use the current solution; however, it is not ideal as it
violates container logging best practices, relies heavily on
directories on the host (which we want to avoid) and is inconsistent
in the way we can get logging from services (some in files, some in
syslog).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Since we’re not getting rid of the previous logging solution, users won’t be
impacted. They will, however, get another way of getting logs and interacting
with them in the host system, and further create automation from that if
needed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;TODO: Any performance considerations on getting everything to journald?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignees:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jaosorior
jbadiapa
larsks&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Allow services to log to stdout/stderr (if possible).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement pluggable logging for each service in t-h-t.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add Rsyslog container.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;TODO: Evaluate how can we log to an EFK stack in upstream CI. Do we have one
available?&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://docs.docker.com/engine/admin/logging/journald/"&gt;https://docs.docker.com/engine/admin/logging/journald/&lt;/a&gt;
[2] &lt;a class="reference external" href="https://docs.docker.com/engine/admin/logging/log_tags/"&gt;https://docs.docker.com/engine/admin/logging/log_tags/&lt;/a&gt;
[3] &lt;a class="reference external" href="https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html"&gt;https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html&lt;/a&gt;
[4] &lt;a class="reference external" href="https://github.com/ViaQ/Main/blob/master/README-install.md"&gt;https://github.com/ViaQ/Main/blob/master/README-install.md&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 23 Jan 2018 00:00:00 </pubDate></item><item><title>TripleO Remote Logging</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/tripleo-rsyslog-remote-logging.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/remote-logging"&gt;https://blueprints.launchpad.net/tripleo/+spec/remote-logging&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec is meant to extend the tripleo-logging spec also for queens to
address key issues about log transport and storage that are separate from
the technical requirements created by logging for containerized processes.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Having logs stuck on individual overcloud nodes isn’t a workable solution
for a modern system deployed at scale. But log aggregation is complex both
to implement and to scale. TripleO should provide a robust, well documented,
and scalable solution that will serve the majority of users needs and be
easily extensible for others.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;In addition to the rsyslog logging to stdout defined for containers in the
triple-logging spec this spec outlines how logging to remote targets should
work in detail.&lt;/p&gt;
&lt;p&gt;Essentially this comes down to a set of options for the config
of the rsyslog container. Other services will have a fixed rsyslog config
that forwards messages to the rsyslog container to pick up over journald.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Logging destination, local, remote direct, or remote aggregator.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Remote direct means to go direct to a storage solution, in this case
Elasticsearch or plaintext on the disk. Remote aggregator is a design where
the processing, formatting, and insertion of the logs is a task left to the
aggregator server. Using aggregators it’s possible to scale log collection to
hundreds of overcloud nodes without overwhelming the storage backend with
inefficient connections.&lt;/p&gt;
&lt;ol class="arabic simple" start="2"&gt;
&lt;li&gt;&lt;p&gt;Log caching for remote targets&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In the case of remote targets a caching system can be setup, where logs are
stored temporarily on the local machine in a configurable disk or memory cache
until they can be uploaded to an aggregator or storage system. While some in
memory cache is mandatory users may select a disk cache depending on how
important it is that all logs be saved and stored. This allows recovery
without loss of messages during network outages or service outages.&lt;/p&gt;
&lt;ol class="arabic simple" start="3"&gt;
&lt;li&gt;&lt;p&gt;Log security in transit&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In some cases encryption during transit may be required. rsyslog offers
ssl based encryption that should be easily deployable.&lt;/p&gt;
&lt;ol class="arabic simple" start="4"&gt;
&lt;li&gt;&lt;p&gt;Standard and extensible format&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By default logs should be formatted as outlined by the Redhat common logging
initiative. By standardizing logging format where possible various tools
and analytics become more portable.&lt;/p&gt;
&lt;p&gt;Mandatory fields for this standard formatting include.&lt;/p&gt;
&lt;p&gt;version: the version of the logging template
level: loglevel
message: the log message
tags: user specific tagging info&lt;/p&gt;
&lt;p&gt;Additional fields must be added in the format of&lt;/p&gt;
&lt;p&gt;&amp;lt;subject&amp;gt;.&amp;lt;subfield name&amp;gt;&lt;/p&gt;
&lt;p&gt;See an example by rsyslog for storage in Elasticsearch below.&lt;/p&gt;
&lt;p&gt;@timestamp              November 27th 2017, 08:54:40.091
@version                2016.01.06-0
_id             AV_9wiWQzdGOuK5_zY5J
_index          logstash-2017.11.27.08
_score
_type           rsyslog
browbeat.cloud_name             openstack-12-noncontainers-beta
hostname                lorenzo.perf.lab.eng.rdu.redhat.com
level           info
message                 Stopping LVM2 PV scan on device 8:2…
pid             1
rsyslog.appname                 systemd
rsyslog.facility                daemon
rsyslog.fromhost-ip             10.12.20.155
rsyslog.inputname               imptcp
rsyslog.protocol-version                1
syslog.timegenerated            November 27th 2017, 08:54:40.092
systemd.t.BOOT_ID               1e99848dbba047edaf04b150313f67a8
systemd.t.CAP_EFFECTIVE                 1fffffffff
systemd.t.CMDLINE               /usr/lib/systemd/systemd –switched-root –system –deserialize 21
systemd.t.COMM          systemd
systemd.t.EXE           /usr/lib/systemd/systemd
systemd.t.GID           0
systemd.t.MACHINE_ID            0d7fed5b203f4664b0b4be90e4a8a992
systemd.t.SELINUX_CONTEXT               system_u:system_r:init_t:s0
systemd.t.SOURCE_REALTIME_TIMESTAMP             1511790880089672
systemd.t.SYSTEMD_CGROUP                /
systemd.t.TRANSPORT             journal
systemd.t.UID           0
systemd.u.CODE_FILE             src/core/unit.c
systemd.u.CODE_FUNCTION                 unit_status_log_starting_stopping_reloading
systemd.u.CODE_LINE             1417
systemd.u.MESSAGE_ID            de5b426a63be47a7b6ac3eaac82e2f6f
systemd.u.UNIT          lvm2-pvscan@8:2.service
tags&lt;/p&gt;
&lt;p&gt;As a visual aid here’s a quick diagram of the flow of data.&lt;/p&gt;
&lt;p&gt;&amp;lt;rsyslog in process container&amp;gt; -&amp;gt; &amp;lt;journald&amp;gt; -&amp;gt; &amp;lt;rsyslog container&amp;gt; -&amp;gt; &amp;lt;rsyslog aggregator / Elasticsearch&amp;gt;&lt;/p&gt;
&lt;p&gt;In the process container logs from the application are packaged with metadata
from systemd and other components depending on how rsyslog is configured,
journald acts as a transport aggregating this input across all containers for
the rsyslog container which formats this data into storable json and handles
things like transforming fields and adding additional metadta as desired.
Finally the data is inserted into elasticsearch or further held by an
aggrebator for a few seconds before being bulk inserted into Elasticsearch.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;TripleO already has some level of FluentD integration, but performance issues
make it unusable at scale. Furthermore it’s not well prepared for container
logging.&lt;/p&gt;
&lt;p&gt;Ideally FluentD as a logging backend would be maintained, improved, and modified
to use the common logging format for easy swapping of solutions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The security of remotely stored data and the log storage database is outside
of the scope of this spec. The major remaining concerns are security in
in transit and the changes required to systemd for rsyslog to send data
remotely.&lt;/p&gt;
&lt;p&gt;A new systemd policy will have to be put into place to ensure that systemd
can successfully log to remote targets. By default the syslog rules prevent
any outside world access or port access, both of which are required for
log forwarding.&lt;/p&gt;
&lt;p&gt;For log encryption in transit a ssl certificate will have to be generated and
distributed to all nodes in the cloud securely, probably during deployment.
Special care should be taken to ensure that any misconfigured instance of
rsyslog without a certificate where one is required do not transmit logs
by accident.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Ideally users will read some documentation and pass an extra 5-6 variables to
TripleO to deploy with logging aggregation. It’s very important that logging
be easy to setup with sane defaults and no requirement on the user to implement
their own formatting or template.&lt;/p&gt;
&lt;p&gt;Users may also have to setup a database for log storage and an aggregator if
their deployment is large enough that they need one. Playbooks to do this
automatically will be provided, but probably don’t belong in TripleO.&lt;/p&gt;
&lt;p&gt;Special care will have to be taken to size storage and aggregation hardware
to the task, while rsyslog is very efficient storage quickly becomes a problem
when a cloud can generate 100gb of logs a day. Especially since log storage
systems leave it up to the user to put in place rotation rules.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;For small clouds rsyslog direct to Elasticsearch will perform just fine.
As scale increases an aggregator (also running rsyslog, except configured
to accept and format input) is required. I have yet to test a large enough
cloud that an aggregator was at all stressed. Hundreds of gigs of logs a day
are possible with a single 32gb ram VM as an Elastic instance.&lt;/p&gt;
&lt;p&gt;For the Overcloud nodes forwarding their logs the impact is variable depending
on the users configuration. CPU requirements don’t exceed single digits of a
single core even under heavy load but storage requirements can balloon if a
large on disk cache was specified and connectivity with the aggregator or
database is lost for prolonged periods.&lt;/p&gt;
&lt;p&gt;Memory usage is no more than a few hundred mb and most of that is the default
in memory log cache. Which once again could be expanded by the user.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Who is leading the writing of the code? Or is this a blueprint where you’re
throwing it out there to see who picks it up?&lt;/p&gt;
&lt;p&gt;If more than one person is working on the implementation, please designate the
primary author and contact.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jkilpatr&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jaosorior&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;rsyslog container - jaosorior&lt;/p&gt;
&lt;p&gt;rsyslog templating and deployment role - jkilpatr&lt;/p&gt;
&lt;p&gt;aggregator and storage server deployment tooling - jkilpatr&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Blueprint dependencies:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog"&gt;https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Package dependencies:&lt;/p&gt;
&lt;p&gt;rsyslog, rsyslog-elasticsearch, rsyslog-mmjsonparse&lt;/p&gt;
&lt;p&gt;specifically version 8 of rsyslog, which is the earliest
supported by rsyslog-elasticsearch, these are packaged in
Centos and rhel 7.4 extras.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Logging aggregation can be tested in CI by deploying it during any existing CI job.&lt;/p&gt;
&lt;p&gt;For extra validation have a script to check the output into Elasticsearch.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation will need to be written about the various modes and tunables for
logging and how to deploy them. As well as sizing recommendations for the log
storage system and aggregators where required.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/490047/"&gt;https://review.openstack.org/#/c/490047/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/521083/"&gt;https://review.openstack.org/#/c/521083/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog"&gt;https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 28 Nov 2017 00:00:00 </pubDate></item><item><title>IPSEC encrypted networks</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/queens/ipsec.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/ipsec"&gt;https://blueprints.launchpad.net/tripleo/+spec/ipsec&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This proposes the usage of IPSEC tunnels for encrypting all communications in a
TripleO cloud.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Having everything in the network encrypted is a hard requirements for certain
use-cases. While TLS everywhere provides support for this, not everyone wants a
full-fledged CA. IPSEC provides an alternative which requires one component
less (the CA) while still fulfilling the security requirements. With the
downside that IPSEC tunnel configurations can get quite verbose.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;As mentioned in the mailing list [1], for OSP10 we already worked on an ansible
role that runs on top of a TripleO deployment [2].&lt;/p&gt;
&lt;p&gt;It does the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Installs IPSEC if it’s not available in the system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sets up the firewall rules.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Based on a hard-coded set of networks, it discovers the IP addresses for each
of them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Based on a hard-coded set of networks, it discovers the Virtual IP addresses
(including the Redis VIP).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It puts up an IPSEC tunnel for most IPs in each network.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Regular IPs are handled as a point-to-point IPSEC tunnel.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Virtual IPs are handled with road-warrior configurations. This means that
the VIP’s tunnel listens for any connections. This enables easier
configuration of the tunnel, as the VIP-holder doesn’t need to be aware nor
configure each tunnel.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Similarly to TLS everywhere, this focuses on service-to-service
communication, so we explicitly skip the tenant network. Or,
as it was in the original ansible role, compute-to-compute communication.
This significantly reduces the amount of tunnels we need to set up, but
leaves application security to the deployer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Authentication for the tunnels is done via a Pre-Shared Key (PSK), which is
shared between all nodes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, it creates an OCF resource that tracks each VIP and puts up or down
its corresponding IPSEC tunnel depending on the VIP’s location.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;While this resource is still in the repository [3], it has now landed
upstream [4]. Once this resource is available in the packaged version of
the resource agents, the preferred version will be the packaged one.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This resource effectively handles VIP fail-overs, by detecting that a VIP
is no longer hosted by the node, it cleanly puts down the IPSEC tunnel and
enables it where the VIP is now hosted.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of this work is already part of the role, however, to have better
integration with the current state of TripleO, the following work is needed:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Support for composable networks.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Now that composable networks are a thing, we can no longer rely on the
hard-coded values we had in the role.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fortunately, this is information we can get from the tripleo dynamic
inventory. So we would need to add information about the available networks
and the VIPs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configurable skipping of networks.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In order to address the tenant network skipping, we need to somehow make it
configurable.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the IPSEC package as part of the image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure Firewall rules the TripleO way.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Currently the role handles the firewall rule setup. However, it should be
fairly simple to configure these rules the same way other services
configure theirs (Using the tripleo.&amp;lt;service&amp;gt;.firewall_rules entry). This
will require the usage of a composable service template.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As mentioned above, we will need to create a composable service template.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;This could take into use the recently added &lt;cite&gt;external_deploy_tasks&lt;/cite&gt; section
of the templates, which will work similarly to the Kubernetes configuration
and would rely on the config-download mechanism [5].&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;While deployers can already use TLS everywhere. A few are already using the
aforementioned ansible role. So this would provide a seamless upgrade path for
them.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;This by itself is a security enhancement, as it enables encryption in the
network.&lt;/p&gt;
&lt;p&gt;The PSK being shared by all the nodes is not ideal and could be addressed by
per-network PSKs. However, this work could be done in further iterations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Currently, the deployer needs to provide their PSK. However, this could be
automated as part of the tasks that TripleO does.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Same as with TLS everywhere, adding encryption in the network will have a
performance impact. We currently don’t have concrete data on what this impact
actually is.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;This would be added as a composable service. So it would be something that the
deployer would need to enable via an environment file.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jaosorior&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add libreswan (IPSEC’s frontend) package to the overcloud-full iamge.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add required information to the dynamic inventory (networks and VIPs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Based on the inventory, create the IPSEC tunnels dynamically, and not based
on the hardcoded networks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add tripleo-ipsec ansible role as part of the TripleO umbrella.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create composable service.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This requires the triple-ipsec role to be available. For this, it will be
moved to the TripleO umbrella and packaged as such.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Given that this doesn’t require an extra component, we could test this as part
of our upstream tests. The requirement being that the deployment has
network-isolation enabled.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2017-November/124615.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2017-November/124615.html&lt;/a&gt;
[2] &lt;a class="reference external" href="https://github.com/JAORMX/tripleo-ipsec"&gt;https://github.com/JAORMX/tripleo-ipsec&lt;/a&gt;
[3] &lt;a class="reference external" href="https://github.com/JAORMX/tripleo-ipsec/blob/master/files/ipsec-resource-agent.sh"&gt;https://github.com/JAORMX/tripleo-ipsec/blob/master/files/ipsec-resource-agent.sh&lt;/a&gt;
[4] &lt;a class="reference external" href="https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/ipsec"&gt;https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/ipsec&lt;/a&gt;
[5] &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/master/extraconfig/services/kubernetes-master.yaml#L58"&gt;https://github.com/openstack/tripleo-heat-templates/blob/master/extraconfig/services/kubernetes-master.yaml#L58&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 21 Nov 2017 00:00:00 </pubDate></item><item><title>Network configuration</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/queens/network-configuration.html</link><description>
 
&lt;p&gt;Network configuration for the TripleO GUI&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently, it’s not possible to make advanced network configurations using the
TripleO GUI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;In the GUI, we will provide a wizard to guide the user through configuring the
networks of their deployment.  The user will be able to assign networks to
roles, and configure additional network parameters.  We will use the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network_data.yaml&lt;/span&gt;&lt;/code&gt; in the &lt;a class="reference external" href="https://review.openstack.org/#/c/409921/"&gt;TripleO Heat Templates&lt;/a&gt;.   The idea is to expose
the data in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network_data.yaml&lt;/span&gt;&lt;/code&gt; via the web interface.&lt;/p&gt;
&lt;p&gt;In addition to the wizard, we will implement a dynamic network topology diagram
to visually present the configured networks.  This will enable the Deployer to
quickly validate their work.  The diagram will rely on &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network_data.yaml&lt;/span&gt;&lt;/code&gt;
and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;roles_data.yaml&lt;/span&gt;&lt;/code&gt; for the actual configuration.&lt;/p&gt;
&lt;p&gt;For details, please see the &lt;a class="reference external" href="https://openstack.invisionapp.com/share/UM87J4NBQ#/screens"&gt;wireframes&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;As an alternative, heat templates can be edited manually to allow customization
before uploading.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The Deployer could accidentally misconfigure the network topology, and thereby
cause data to be exposed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The addition of the configuration wizard and the network topology diagram should
have no performance impact on the amount of time needed to run a deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;As with any new substantial feature, the impact on the developer is cognitive.
We will have to gain a detail understanding of network configuration in
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network_data.yaml&lt;/span&gt;&lt;/code&gt;.  Also, testing will add overhead on our efforts.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;We can proceed with implementation immediately.&lt;/p&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;hpokorny&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Network configuration wizard
- Reading data from the backend
- Saving changes
- UI based on wireframes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Network topology diagram
- Investigate suitable javascript libraries
- UI based on wireframes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The presence of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;roles_data.yaml&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;network_data.yaml&lt;/span&gt;&lt;/code&gt; in the plan&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A javascript library for drawing the diagram&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Testing shouldn’t pose any real challenges with the exception of the network
topology diagram rendering.  At best, this is currently unknown as it depends on
the chosen javascript library.  Verifying that the correct diagram is displayed
using automated testing might be non-trivial.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We should document the new settings introduced by the wizard.  The documentation
should be transferable between the heat template project, and TripleO UI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Fri, 15 Sep 2017 00:00:00 </pubDate></item><item><title>Tripleo RPC and Notification Messaging Support</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/queens/tripleo-messaging.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo"&gt;https://blueprints.launchpad.net/tripleo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This specification proposes changes to tripleo to enable the selection
and configuration of separate messaging backends for oslo.messaging
RPC and Notification communications. This proposal is a derivative of
the work associated with the original blueprint &lt;a class="footnote-reference brackets" href="#id3" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; and specification
&lt;a class="footnote-reference brackets" href="#id4" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; to enable dual backends for oslo.messaging in tripleo.&lt;/p&gt;
&lt;p&gt;Most of the groundwork to enable dual backends was implemented during
the pike release and the introduction of an alternative messaging
backend (qdrouterd) service was made. Presently, the deployment of this
alternative messaging backend is accomplished by aliasing the rabbitmq
service as the tripleo implementation does not model separate
messaging backends.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The oslo.messaging library supports the deployment of dual messaging
system backends for RPC and Notification communications. However, tripleo
currently deploys a single rabbitmq server (cluster) that serves as a
single messaging backend for both RPC and Notifications.&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt; &lt;span class="o"&gt;+------------+&lt;/span&gt;         &lt;span class="o"&gt;+----------+&lt;/span&gt;
 &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;RPC&lt;/span&gt; &lt;span class="n"&gt;Caller&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Notifier&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
 &lt;span class="o"&gt;+-----+------+&lt;/span&gt;         &lt;span class="o"&gt;+----+-----+&lt;/span&gt;
       &lt;span class="o"&gt;|&lt;/span&gt;                     &lt;span class="o"&gt;|&lt;/span&gt;
       &lt;span class="o"&gt;+--+&lt;/span&gt;               &lt;span class="o"&gt;+--+&lt;/span&gt;
          &lt;span class="o"&gt;|&lt;/span&gt;               &lt;span class="o"&gt;|&lt;/span&gt;
          &lt;span class="n"&gt;v&lt;/span&gt;               &lt;span class="n"&gt;v&lt;/span&gt;
        &lt;span class="o"&gt;+-+---------------+-+&lt;/span&gt;
        &lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="n"&gt;RabbitMQ&lt;/span&gt; &lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
        &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Messaging&lt;/span&gt; &lt;span class="n"&gt;Backend&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
        &lt;span class="o"&gt;|&lt;/span&gt;                   &lt;span class="o"&gt;|&lt;/span&gt;
        &lt;span class="o"&gt;+-+---------------+-+&lt;/span&gt;
          &lt;span class="o"&gt;^&lt;/span&gt;               &lt;span class="o"&gt;^&lt;/span&gt;
          &lt;span class="o"&gt;|&lt;/span&gt;               &lt;span class="o"&gt;|&lt;/span&gt;
       &lt;span class="o"&gt;+--+&lt;/span&gt;               &lt;span class="o"&gt;+--+&lt;/span&gt;
       &lt;span class="o"&gt;|&lt;/span&gt;                     &lt;span class="o"&gt;|&lt;/span&gt;
       &lt;span class="n"&gt;v&lt;/span&gt;                     &lt;span class="n"&gt;v&lt;/span&gt;
&lt;span class="o"&gt;+------+-----+&lt;/span&gt;        &lt;span class="o"&gt;+------+-------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="n"&gt;RPC&lt;/span&gt;     &lt;span class="o"&gt;|&lt;/span&gt;        &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Notification&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="n"&gt;Server&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt;        &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="n"&gt;Server&lt;/span&gt;    &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+------------+&lt;/span&gt;        &lt;span class="o"&gt;+--------------+&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To support two separate and distinct messaging backends, tripleo needs
to “duplicate” the set of parameters needed to specify each messaging
system. The oslo.messaging library in OpenStack provides the API to the
messaging services. It is proposed that the implementation model the
RPC and Notification messaging services in place of the backend
messaging server (e.g. rabbitmq).&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;   &lt;span class="o"&gt;+------------+&lt;/span&gt;          &lt;span class="o"&gt;+----------+&lt;/span&gt;
   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;RPC&lt;/span&gt; &lt;span class="n"&gt;Caller&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Notifier&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="o"&gt;+-----+------+&lt;/span&gt;          &lt;span class="o"&gt;+----+-----+&lt;/span&gt;
         &lt;span class="o"&gt;|&lt;/span&gt;                      &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="o"&gt;|&lt;/span&gt;                      &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="n"&gt;v&lt;/span&gt;                      &lt;span class="n"&gt;v&lt;/span&gt;
&lt;span class="o"&gt;+-------------------+&lt;/span&gt;  &lt;span class="o"&gt;+-------------------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;       &lt;span class="n"&gt;RPC&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="n"&gt;Notification&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Messaging&lt;/span&gt; &lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Messaging&lt;/span&gt; &lt;span class="n"&gt;Service&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;                   &lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;                   &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+--------+----------+&lt;/span&gt;  &lt;span class="o"&gt;+--------+----------+&lt;/span&gt;
         &lt;span class="o"&gt;|&lt;/span&gt;                      &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="o"&gt;|&lt;/span&gt;                      &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="n"&gt;v&lt;/span&gt;                      &lt;span class="n"&gt;v&lt;/span&gt;
   &lt;span class="o"&gt;+------------+&lt;/span&gt;        &lt;span class="o"&gt;+------+-------+&lt;/span&gt;
   &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="n"&gt;RPC&lt;/span&gt;     &lt;span class="o"&gt;|&lt;/span&gt;        &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Notification&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="n"&gt;Server&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt;        &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="n"&gt;Server&lt;/span&gt;    &lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="o"&gt;+------------+&lt;/span&gt;        &lt;span class="o"&gt;+--------------+&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Introducing the separate messaging services and associated parameters in place
of the rabbitmq server is not a major rework but special consideration
must be made to upgrade paths and capabilities to ensure that existing
configurations are not impacted.&lt;/p&gt;
&lt;p&gt;Having separate messaging backends for RPC and Notification
communications provides a number of benefits. These benefits include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;tuning the backend to the messaging patterns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;increased aggregate message capacity&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;reduced applied load to messaging servers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;increased message throughput&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;reduced message latency&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;etc.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;A number of issues need to be resolved in order to express RPC
and Notification messaging services on top of the backend messaging systems.&lt;/p&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The proposed change is similar to the concept of a service “backend”
that is configured by tripleo. A number of existing services support
such a backend (or plugin) model. The implementation of a messaging
service backend model should account for the following requirements:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;deploy a single messaging backend for both RPC and Notifications&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;deploy a messaging backend twice, once for RPC and once for
Notifications&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;deploy a messaging backend for RPC and a different messaging backend
for Notifications&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;deploy an external messaging backend for RPC&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;deploy an external messaging backend for Notifications&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Generally, the parameters that were required for deployment of the
rabbitmq service should be duplicated and renamed to “RPC Messaging”
and “Notify Messaging” backend service definitions. Individual backend
files would exist for each possible backend type (e.g. rabbitmq,
qdrouterd, zeromq, kafka or external). The backend selected will
correspondingly define the messaging transport for the messaging
system.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;transport specifier&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;username&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;password (and generation)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;host&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;port&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;virtual host(s)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ssl (enabled)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ssl configuration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;health checks&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tripleo should continue to have a default configuration that deploys
RPC and Notifications messaging services on top of a single rabbitmq
backend server (cluster). Tripleo upgrades should map the legacy
rabbitmq service deployment onto the RPC and Notification messaging
services model.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The configuration of separate messaging backends could be post
overcloud deployment (e.g. external to tripleo framework). This would
be problematic over the lifecycle of deployments e.g. during upgrades etc.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The deployment of dual messaging backends for RPC and Notification
communications should be the same from a security standpoint. This
assumes the backends have parity from a security feature
perspective, e.g authentication and encryption.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Depending on the configuration of the messaging backend deployment,
there could be a number of end user impacts including the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;monitoring of separated messaging backend services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;understanding differences in functionality/behaviors between different
messaging backends (e.g. broker versus router, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;handling exceptions (e.g. different places for logs, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Using separate messaging systems for RPC and Notifications  should
have a positive impact on performance and scalability by:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;separating RPC and Notification messaging loads&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;increased parallelism in message processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;increased aggregate message transfer capacity&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tuned backend configuration aligned to messaging patterns&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;The deployment of hybrid messaging will be new to OpenStack
operators. Operators will need to learn the architectural differences
as compared to a single backend deployment. This will include capacity
planning, monitoring, troubleshooting and maintenance best practices.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Discuss things that will affect other developers working on OpenStack.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignee:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Andy Smith &amp;lt;&lt;a class="reference external" href="mailto:ansmith%40redhat.com"&gt;ansmith&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;John Eckersberg &amp;lt;&lt;a class="reference external" href="mailto:jeckersb%40redhat.com"&gt;jeckersb&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;tripleo-heat-templates:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Modify &lt;em&gt;puppet/services/&amp;lt;service&amp;gt;base.yaml&lt;/em&gt; to introduce separate RPC and
Notification Messaging parameters (e.g. replace ‘rabbit’ parameters)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support two ssl environments (e.g. one for RPC and one for
Notification when separate backends are deployed)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Consider example backend model such as the following:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;heat&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;templates&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;environments&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rabbitmq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;qdrouterd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;zmq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;puppet&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;services&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rabbitmq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;qdrouterd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;zmq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;messaging&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;roles&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;puppet_tripleo:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Replace rabbitmq_node_names with messaging_rpc_node_names and
messaging_notify_node_names or similar&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add vhost support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Consider example backend model such as the following:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;puppet&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;manifests&lt;/span&gt;
   &lt;span class="o"&gt;|&lt;/span&gt;
   &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;
      &lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;base&lt;/span&gt;
         &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;messaging&lt;/span&gt;
            &lt;span class="o"&gt;|&lt;/span&gt;
            &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;
            &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;
            &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;notify&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;
               &lt;span class="o"&gt;|&lt;/span&gt;
               &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;
                  &lt;span class="o"&gt;|&lt;/span&gt;
                  &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;rabbitmq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;
                  &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;qdrouterd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;
                  &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;zmq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;
                  &lt;span class="o"&gt;+--+&lt;/span&gt; &lt;span class="n"&gt;kafka&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;tripleo_common:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add user and password management for RPC and Messaging services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support distinct health checks for separated messaging backends&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;packemaker:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Determine what should happen when two separate rabbitmq clusters
are deployed. Does this result in two pacemaker services or one?
Some experimentation may be required.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;In order to test this in CI, an environment will be needed where separate
messaging system backends (e.g. rabbitMQ server and dispatch-router
server) are deployed. Any existing hardware configuration should be
appropriate for the dual backend deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The deployment documentation will need to be updated to cover the
configuration of the separate messaging (RPC and Notify) services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id3" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/om-dual-backends"&gt;https://blueprints.launchpad.net/tripleo/+spec/om-dual-backends&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="id4" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/396740/"&gt;https://review.openstack.org/#/c/396740/&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Thu, 14 Sep 2017 00:00:00 </pubDate></item><item><title>Adding OVS Hardware Offload to TripleO</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/queens/triplo-ovs-hw-offload.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-ovs-hw-offload"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-ovs-hw-offload&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;OVS Hardware Offload leverages SR-IOV technology to control the SR-IOV
VF using VF representor port. OVS 2.8.0 supports the hw-offload option which
allows to offload OVS datapath rule to hardware using linux traffic control
tool and the VF representor port. This feature accelerates the OVS
with a SR-IOV NIC which support switchdev mode.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Today the installation and configuration of OVS hardware offload feature is
done manually after overcloud deployment. It shall be automated via tripleo.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Configure the SR-IOV NIC to be in switchdev mode using the following
syntax &amp;lt;physical_interface&amp;gt;:&amp;lt;numvfs&amp;gt;:&amp;lt;mode&amp;gt; for NeutronSriovNumVFs.
mode can be legacy or switchdev&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure the OVS with other_config:hw-offload. The options can
be added for the cluster without side effects, because if then NIC doesn’t
support OVS will fall-back to kernel datapath.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nova scheduler should be configured to use the PciPassthroughFilter
(same SR-IOV)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nova compute should be configured with passthrough_whitelist (same SR-IOV)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;OVS Hardware Offload leverage the SR-IOV technology to provides near
native I/O performance for each virtual machine that managed by OpenVswitch.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The operator shall ensure that the BIOS supports VT-d/IOMMU virtualization
technology on the compute nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;IOMMU needs to be enabled in the Compute+SR-IOV nodes. Boot parameters
(intel_iommu=on or  amd_iommu=pt) shall be added in the grub.conf, using the
PreNetworkConfig.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Post deployment, operator shall&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Create neutron ports prior to creating VM’s (nova boot)
openstack port create –vnic-type direct –binding-profile ‘{“capabilities”: [“switchdev”]}’ port1&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create the VM with the required flavor and SR-IOV port id
openstack server create –image cirros-mellanox_sriov –port=port1 –flavor m1.tiny vm_a1&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;waleedm (&lt;a class="reference external" href="mailto:waleedm%40mellanox.com"&gt;waleedm&lt;span&gt;@&lt;/span&gt;mellanox&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;moshele (&lt;a class="reference external" href="mailto:moshele%40mellanox.com"&gt;moshele&lt;span&gt;@&lt;/span&gt;mellanox&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;)&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Update tripleo::host::sriov::numvfs_persistence to allow configure SR-IOV
in switchdev mode. extending the vf_defs to
&amp;lt;physical_interface&amp;gt;:&amp;lt;numvfs&amp;gt;:&amp;lt;mode&amp;gt;. Mode can be legacy which is default
SR-IOV or switchdev which is used for ovs hardware offload.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add a template parameter called NeutronOVSHwOffload to enable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;provide environment YAML for OVS hardware offload in tripleo-heat-templates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Since SR-IOV needs specific hardware support, this feature can be tested
under third party CI. We hope to provide Mellanox CI to SR-IOV and this
feature.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Introduction to SR-IOV
&lt;a class="reference external" href="http://goo.gl/m7jP3"&gt;http://goo.gl/m7jP3&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SR-IOV OVS hardware offload netdevconf
&lt;a class="reference external" href="http://netdevconf.org/1.2/papers/efraim-gerlitz-sriov-ovs-final.pdf"&gt;http://netdevconf.org/1.2/papers/efraim-gerlitz-sriov-ovs-final.pdf&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OVS hardware offload in OpenVswitch
&lt;a class="reference external" href="https://mail.openvswitch.org/pipermail/ovs-dev/2017-April/330606.html"&gt;https://mail.openvswitch.org/pipermail/ovs-dev/2017-April/330606.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OpenStack OVS mechanism driver support in neutron/nova/os-vif
&lt;a class="reference external" href="https://review.openstack.org/#/c/398265/"&gt;https://review.openstack.org/#/c/398265/&lt;/a&gt;
&lt;a class="reference external" href="https://review.openstack.org/#/c/275616/"&gt;https://review.openstack.org/#/c/275616/&lt;/a&gt;
&lt;a class="reference external" href="https://review.openstack.org/#/c/460278/"&gt;https://review.openstack.org/#/c/460278/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Sun, 10 Sep 2017 00:00:00 </pubDate></item><item><title>Add Adapter Teaming to os-net-config</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/os-net-config-teaming.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/os-net-config/+spec/os-net-config-teaming"&gt;https://blueprints.launchpad.net/os-net-config/+spec/os-net-config-teaming&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec describes adding features to os-net-config to support adapter teaming
as an option for bonded interfaces. Adapter teaming allows additional features
over regular bonding, due to the use of the teaming agent.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;os-net-config supports both OVS bonding and Linux kernel bonding, but some
users want to use adapter teaming instead of bonding. Adapter teaming provides
additional options that bonds don’t support, and do support almost all of the
options that are supported by bonds.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Add a new class similar to the existing bond classes that allows for the
configuration of the teamd daemon through teamdctl. The syntax for the
configuration of the teams should be functionally similar to configuring
bonds.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We already have two bonding methods in use, the Linux bonding kernel module,
and Open vSwitch. However, adapter teaming is becoming a best practice, and
this change will open up that possibility.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The end result of using teaming instead of other modes of bonding should be
the same from a security standpoint. Adapter teaming does not interfere with
iptables or selinux.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Operators who are troubleshooting a deployment where teaming is used may need
to familiarize themselves with the teamdctl utility.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Using teaming rather than bonding will have a mostly positive impact on
performance. Teaming is very lightweight, and may use less CPU than other
bonding modes, especially OVS. Teaming has the following impacts:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Fine-grained control over load balancing hashing algorithms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Port-priorities and stickyness&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Per-port monitoring.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;In TripleO, os-net-config has existing sample templates for OVS-mode
bonds and Linux bonds. There has been some discussion with Dan Prince
about unifying the bonding templates in the future.&lt;/p&gt;
&lt;p&gt;The type of bond could be set as a parameter in the NIC config
templates. To this end, it probably makes sense to make the teaming
configuration as similar to the bonding configurations as possible.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;If possible, the configuration should be as similar to the bonding
configuration as possible. In fact, it might be treated as a different
form of bond, as long as the required metadata for teaming can be
provided in the options.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Dan Sneddon &amp;lt;&lt;a class="reference external" href="mailto:dsneddon%40redhat.com"&gt;dsneddon&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add teaming object and unit tests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure sample templates to demonstrate usage of teaming.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Test TripleO with new version of os-net-config and adapter teaming configured.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="configuration-example"&gt;
&lt;h3&gt;Configuration Example&lt;/h3&gt;
&lt;p&gt;The following is an example of a teaming configuration that os-net-config
should be able to implement:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="o"&gt;-&lt;/span&gt;
  &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;linux_team&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;team0&lt;/span&gt;
  &lt;span class="n"&gt;bonding_options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"{"&lt;/span&gt;&lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="s2"&gt;": {"&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="s2"&gt;": "&lt;/span&gt;&lt;span class="n"&gt;activebackup&lt;/span&gt;&lt;span class="s2"&gt;"}, "&lt;/span&gt;&lt;span class="n"&gt;link_watch&lt;/span&gt;&lt;span class="s2"&gt;": {"&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="s2"&gt;": "&lt;/span&gt;&lt;span class="n"&gt;ethtool&lt;/span&gt;&lt;span class="s2"&gt;"}}"&lt;/span&gt;
  &lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt;
      &lt;span class="n"&gt;ip_subnet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;192.168.0.10&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;
  &lt;span class="n"&gt;members&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt;
      &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;
      &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;eno2&lt;/span&gt;
      &lt;span class="n"&gt;primary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;true&lt;/span&gt;
    &lt;span class="o"&gt;-&lt;/span&gt;
      &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interface&lt;/span&gt;
      &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;eno3&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The only difference between a Linux bond configuration and an adapter team
configuration in the above example is the type (linux_team), and the content
of the bonding_options (bonding has a different format for options).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation-details"&gt;
&lt;h3&gt;Implementation Details&lt;/h3&gt;
&lt;p&gt;os-net-config will have to configure the ifcfg files for the team. The ifcfg
format for team interfaces is documented here [1].&lt;/p&gt;
&lt;p&gt;If an interface is marked as primary, then the ifcfg file for that interface
should list it at a higher than default (0) priority:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;TEAM_PORT_CONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{"prio": 100}'&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The mode is set in the runner: statement, as well as any settings that
apply to that teaming mode.&lt;/p&gt;
&lt;p&gt;We have the option of using strictly ifcfg files or using the ip utility
to influence the settings of the adapter team. It appears from the teaming
documentation that either approach will work.&lt;/p&gt;
&lt;p&gt;The proposed implementation [2] of adapter teaming for os-net-config uses
only ifcfg files to set the team settings, slave interfaces, and to
set the primary interface. The potential downside of this path is that
the interface must be shut down and restarted when config changes are
made, but that is consistent with the other device types in os-net-config.
This is probably acceptable, since network changes are made rarely and
are assumed to be disruptive to the host being reconfigured.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;teamd daemon and teamdctl command-line utility must be installed. teamd is
not installed by default on RHEL/CENTOS, however, teamd is currently
included in the RDO overcloud-full image. This should be added ot the list
of os-net-config RPM dependencies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For LACP bonds using 802.3ad, switch support will need to be configured and
at least two ports must be configured for LACP bonding.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;In order to test this in CI, we would need to have an environment where we
have multiple physical NICs. Adapter teaming supports modes other than LACP,
so we could possibly get away with multiple links without any special
configuration.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The deployment documentation will need to be updated to cover the use of
teaming. The os-net-config sample configurations will demonstrate the use
in os-net-config. TripleO Heat template examples should also help with
deployments using teaming.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;[1] - Documentation: Creating a Network Team Using ifcfg Files
&lt;a class="reference external" href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Configure_a_Network_Team_Using-the_Command_Line.html#sec-Creating_a_Network_Team_Using_ifcfg_Files"&gt;https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Configure_a_Network_Team_Using-the_Command_Line.html#sec-Creating_a_Network_Team_Using_ifcfg_Files&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[2] - Review: Add adapter teaming support using teamd for ifcfg-systems
&lt;a class="reference external" href="https://review.openstack.org/#/c/339854/"&gt;https://review.openstack.org/#/c/339854/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Mon, 28 Aug 2017 00:00:00 </pubDate></item><item><title>Pacemaker Next Generation Architecture</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/pacemaker-next-generation-architecture.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/ha-lightweight-architecture"&gt;https://blueprints.launchpad.net/tripleo/+spec/ha-lightweight-architecture&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Change the existing HA manifests and templates to deploy a minimal pacemaker
architecture, where all the openstack services are started and monitored by
systemd with the exception of: VIPs/Haproxy, rabbitmq, redis and galera.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The pacemaker architecture deployed currently via
&lt;cite&gt;puppet/manifests/overcloud_controller_pacemaker.pp&lt;/cite&gt; manages most
service on the controllers via pacemaker. This approach, while having the
advantage of having a single entity managing and monitoring all services, does
bring a certain complexity to it and assumes that the operators are quite
familiar with pacemaker and its management of resources. The aim is to
propose a new architecture, replacing the existing one, where pacemaker
controls the following resources:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Virtual IPs + HAProxy&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RabbitMQ&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Galera&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Redis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;openstack-cinder-volume (as the service is not A/A yet)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Any future Active/Passive service&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Basically every service that is managed today by a specific resource agent
and not systemd, will be still running under pacemaker. The same goes
for any service (like openstack-cinder-volume) that need to be active/passive.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Initially the plan was to create a brand new template implementing this
new HA architecture. After a few rounds of discussions within the TripleO
community, it has been decided to actually have a single HA architecture.
The main reasons for moving to a single next generation HA architecture are due to
the amount work needed to maintain two separate architectures and to the
fact that the previous HA architecture does not bring substantial advantages
over this next generation one.&lt;/p&gt;
&lt;p&gt;The new architecture will enable most services via systemd and will remove most
pacemaker resource definitions with their corresponding constraints.
In terms of ordering constraints we will go from a graph like this one:
&lt;a class="reference external" href="http://acksyn.org/files/tripleo/wsgi-openstack-core.pdf"&gt;http://acksyn.org/files/tripleo/wsgi-openstack-core.pdf&lt;/a&gt; (mitaka)&lt;/p&gt;
&lt;p&gt;to a graph like this one:
&lt;a class="reference external" href="http://acksyn.org/files/tripleo/light-cib-nomongo.pdf"&gt;http://acksyn.org/files/tripleo/light-cib-nomongo.pdf&lt;/a&gt; (next-generation-mitaka)&lt;/p&gt;
&lt;p&gt;Once this new architecture is in place and we have tested it extensively, we
can work on the upgrade path from the previous fully-fledged pacemaker HA
architecture to this new one. Since the impact of pacemaker in the new
architecture is quite small, it is possible to consider dropping the non-ha
template in the future for every deployment and every CI job. The decision
on this can be taken in a later step, even post-newton.&lt;/p&gt;
&lt;p&gt;Another side-benefit is that with this newer architecture the
whole upgrade/update topic is much easier to manage with TripleO,
because there is less coordination needed between pacemaker, the update
of openstack services, puppet and the update process itself.&lt;/p&gt;
&lt;p&gt;Note that once composable service land, this next generation architecture will
merely consist of a single environment file setting some services to be
started via systemd, some via pacemaker and a bunch of environment variables
needed for the services to reconnect even when galera and rabbitmq are down.
All services that need to be started via systemd will be done via the default
state:
&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/40ad2899106bc5e5c0cf34c40c9f391e19122a49/overcloud-resource-registry-puppet.yaml#L124"&gt;https://github.com/openstack/tripleo-heat-templates/blob/40ad2899106bc5e5c0cf34c40c9f391e19122a49/overcloud-resource-registry-puppet.yaml#L124&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The services running via pacemaker will be explicitely listed in an
environment file, like here:
&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/40ad2899106bc5e5c0cf34c40c9f391e19122a49/environments/puppet-pacemaker.yaml#L12"&gt;https://github.com/openstack/tripleo-heat-templates/blob/40ad2899106bc5e5c0cf34c40c9f391e19122a49/environments/puppet-pacemaker.yaml#L12&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;There are many alternative designs for the HA architecture. The decision
to use pacemaker only for a certain set of “core” services and all the
Active/Passive services comes from a careful balance between complexity
of the architecture and its management and being able to recover resources
in a known broken state. There is a main assumption here about native
openstack services:&lt;/p&gt;
&lt;p&gt;They &lt;em&gt;must&lt;/em&gt; be able to start when the broker and the database are down and keep
retrying.&lt;/p&gt;
&lt;p&gt;The reason for using only pacemaker for the core services and not, for
example keepalived for the Virtual IPs, is to keep the stack simple and
not introduce multiple distributed resource managers. Also, if we used
only keepalived, we’d have no way of recovering from a failure beyond
trying to relocate the VIP.&lt;/p&gt;
&lt;p&gt;The reason for keeping haproxy under pacemaker’s management is that
we can guarantee that a VIP will always run where haproxy is running,
should an haproxy service fail.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;No changes regarding security aspects compared to the existing status quo.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The operators working with a cloud are impacted in the following ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The services (galera, redis, openstack-cinder-volume, VIPs,
haproxy) will be managed as usual via &lt;cite&gt;pcs&lt;/cite&gt;. Pacemaker will monitor these
services and provide their status via &lt;cite&gt;pcs status&lt;/cite&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All other services will be managed via &lt;cite&gt;systemctl&lt;/cite&gt; and systemd will be
configured to automatically restart a failed service. Note, that this is
already done in RDO with (Restart={always,on-failure}) in the service files.
It is a noop when pacemaker manages the service as an override file is
created by pacemaker:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/ClusterLabs/pacemaker/blob/master/lib/services/systemd.c#L547"&gt;https://github.com/ClusterLabs/pacemaker/blob/master/lib/services/systemd.c#L547&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;With the new architecture, restarting a native openstack service across
all controllers will require restarting it via &lt;cite&gt;systemctl&lt;/cite&gt; on each node (as opposed
to a single &lt;cite&gt;pcs&lt;/cite&gt; command as it is done today)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All services will be configured to retry indefinitely to connect to
the database or to the messaging broker. In case of a controller failure,
the failover scenario will be the same as with the current HA architecture,
with the difference that the services will just retry to re-connect indefinitely.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Previously with the HA template every service would be monitored and managed by
pacemaker. With the split between openstack services being managed by systemd and
“core” services managed by pacemaker, the operator needs to know which service
to monitor with which command.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;No changes compared to the existing architecture.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;In the future we might see if the removal of the non-HA template is feasible,
thereby simplifying our CI jobs and have single more-maintained template.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;michele&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;…&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Prepare the roles that deploy the next generation architecture.  Initially,
keep it as close as possible to the existing HA template and make it simpler
in a second iteration (remove unnecessary steps, etc.) Template currently
lives here and deploys successfully:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/314208/"&gt;https://review.openstack.org/#/c/314208/&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Test failure scenarios and recovery scenario, open bugs against services that
misbehave in the face of database and/or broker being down.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Initial smoke-testing has been completed successfully. Another set of tests
focusing on the behaviour of openstack services when galera and rabbitmq are
down is in the process of being run.&lt;/p&gt;
&lt;p&gt;Particular focus will be on failover scenarios and recovery times and making
sure that there are no regressions compared to the current HA architecture.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Currently we do not describe the architectures as deployed by TripleO itself,
so no changes needed. A short page in the docs describing the architecture
would be a nice thing to have in the future.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;This design came mostly out from a meeting in Brno with the following attendees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Andrew Beekhof&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chris Feist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Eoghan Glynn&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fabio Di Nitto&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Graeme Gillies&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hugh Brock&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Javier Peña&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Jiri Stransky&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lars Kellogg-Steadman&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mark Mcloughlin&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Michele Baldessari&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Raoul Scarazzini&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rob Young&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Mon, 28 Aug 2017 00:00:00 </pubDate></item><item><title>Best practices for logging of containerized services</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/containerized-services-logs.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/containerized-services-logs"&gt;https://blueprints.launchpad.net/tripleo/+spec/containerized-services-logs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Containerized services shall persist its logs. There are many ways to address
that. The scope of this blueprint is to suggest best practices and intermediate
implementation steps for Pike release as well.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Pike will be released with a notion of hybrid deployments, which is some
services may be running in containers and managed by docker daemon, and
some may be managed by systemd or Pacemaker and placed on hosts directly.&lt;/p&gt;
&lt;p&gt;The notion of composable deployments as well assumes end users and
developers may want to deploy some services non-containerized and tripleo
heat templates shall not prevent them from doing so.&lt;/p&gt;
&lt;p&gt;Despite the service placement type, end users and developers shall get all
logs persisted, consistent and available for future analysis.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;As the spec transitions from Pike, some of the sections below are
split into the Pike and Queens parts.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The scope of this document for Pike is limited to recommendations for
developers of containerized services, bearing in mind use cases for hybrid
environments. It addresses only intermediate implementation steps for Pike and
smooth UX with upgrades from Ocata to Pike, and with future upgrades from Pike
as well.&lt;/p&gt;
&lt;p&gt;A &lt;a class="reference external" href="https://12factor.net/logs"&gt;12factor&lt;/a&gt; is the general guideline for logging
in containerized apps. Based on it, we rephrase our main design assumption as:
“each running process writes its only event stream to be persisted outside
of its container”. And we put an additional design constraint: “each container
has its only running foreground process, nothing else requires persistent
logs that may outlast the container execution time”. This assumes all streams
but the main event stream are ephemeral and live no longer than the container
instance does.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;HA statefull services may require another approach, see the
alternatives section for more details.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The scope for future releases, starting from Queens, shall include best
practices for collecting (shipping), storing (persisting), processing (parsing)
and accessing (filtering) logs of hybrid TripleO deployments with advanced
techniques like EFK (Elasticsearch, Fluentd, Kibana) or the like. Hereafter
those are referred as “future steps”.&lt;/p&gt;
&lt;p&gt;Note, this is limited to OpenStack and Linux HA stack (Pacemaker and Corosync).
We can do nothing to the rest of the supporting and legacy apps like
webservers, load balancing revers proxies, database and message queue clusters.
Even if we could, this stays out of OpenStack specs scope.&lt;/p&gt;
&lt;p&gt;Here is a list of suggested best practices for TripleO developers for Pike:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Host services shall keep writing logs as is, having UIDs, logging configs,
rotation rules and target directories unchanged.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Host services changing its control plane to systemd or pacemaker
in Ocata to Pike upgrade process, may have logging configs, rules and
destinations changed as well, but this is out of the scope of this spec.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Containerized services that normally log to files under the &lt;cite&gt;/var/log&lt;/cite&gt; dir,
shall keep logging as is inside of containers. The logs shall be persisted
with hostpath mounted volumes placed under the &lt;cite&gt;/var/log/containers&lt;/cite&gt; path.
This is required because of the hybrid use cases. For example, containerized
nova services access &lt;cite&gt;/var/log/nova&lt;/cite&gt; with different UIDs than the host
services would have. Given that, nova containers should have log volumes
mounted as &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;-v&lt;/span&gt; &lt;span class="pre"&gt;/var/log/nova:/var/log/containers/nova&lt;/span&gt;&lt;/code&gt; in order to not
bring conflicts. Persisted log files then can be pulled by a node agent like
fluentd or rsyslog and forwarded to a central logging service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Containerized services that can only log to syslog facilities: bind mount
/dev/log into all tripleo service containers as well so that the host
collects the logs via journald. This should be a standard component of our
container “API”: we guarantee (a) a log directory and (b) a syslog socket
for &lt;em&gt;every&lt;/em&gt; containerized service. Collected journald logs then can be pulled
by a node agent like fluentd or rsyslog and forwarded to a central logging
service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Containerized services that leverage Kolla bootstrap, extended start and/or
config facilities, shall be templated with Heat deployment steps as the
following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Host prep tasks to ensure target directories pre-created for hosts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kolla config’s permissions to enforce ownership for log dirs (hostpath
mounted volumes).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Init containers steps to chown log directories early otherwise. Kolla
bootstrap and DB sync containers are normally invoked before the
&lt;cite&gt;kolla_config&lt;/cite&gt; permissions to be set. Therefore come init containers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Containerized services that do not use Kolla and run as root in containers
shall be running from a separate user namespace remapped to a non root host
user, for security reasons. No such services are currently deployed by
TripleO, though.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Docker daemon would have to be running under that remapped non root
user as well. See docker documentation for the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;--userns-remap&lt;/span&gt;&lt;/code&gt; option.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Containerized services that run under pacemaker (or pacemaker remote)
control plane and do not fall into any of the given cases: bind mount
/dev/log as well. At this stage the way services log is in line with the best
practice w.r.t “dedicated log directory to avoid conflicts”. Pacemaker
bundles isolate the containerized resources’ logs on the host into
&lt;cite&gt;/var/log/pacemaker/bundles/{resource}&lt;/cite&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Future steps TBD.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Those below come for future steps only.&lt;/p&gt;
&lt;p&gt;Alternatively to hostpath mounted volumes, create a directory structure such
that each container has a namespace for its logs somewhere under &lt;cite&gt;/var/log&lt;/cite&gt;.
So, a container named 12345 would have &lt;em&gt;all its logs&lt;/em&gt; in the
&lt;cite&gt;/var/log/container-12345&lt;/cite&gt; directory structure (requires clarification).
This also alters the assumption that in general there is only one main log
per a container, which is the case for highly available containerized
statefull services bundled with pacemaker remote, with multiple logs to
capture, like &lt;cite&gt;/var/log/pacemaker.log&lt;/cite&gt;, logs for cluster bootstrapping
events, control plane agents, helper tools like rsyncd, and the statefull
service itself.&lt;/p&gt;
&lt;p&gt;When we have control over the logging API (e.g. via oslo.log), we can forsake
hostpath mounted volumes and configure containerized services to output to
syslog (via bind mount &lt;cite&gt;/dev/log&lt;/cite&gt;) so that the host collects the logs via
journald). Or configure services to log only to stdout, so that docker daemon
collects logs and ships them to the journald.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;The “winning” trend is switching all (including openstack
services) to syslog and log nothing to the /var/log/, e.g. just bind-mount
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;-v&lt;/span&gt; &lt;span class="pre"&gt;/dev/null:/var/log&lt;/span&gt;&lt;/code&gt; for containers.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Or use a specialized log driver like the oslo.log fluentd logging driver
(instead of the default journald or json-file) to output to a fluentd log agent
running on the host or containerized as well, which would then aggregate logs
from all containers, annotate with node metadata, and use the fluentd
&lt;cite&gt;secure_forward&lt;/cite&gt; protocol to send the logs to a remote fluentd agent like
common logging.&lt;/p&gt;
&lt;p&gt;These are not doable for Pike as requiring too many changes impacting upgrade
UX as well. Although, this is the only recommended best practice and end goal for
future releases and future steps coming after Pike.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;As the spec transitions from Pike, the section is split into the Pike and
Queens parts.&lt;/p&gt;
&lt;p&gt;UID collisions may happen for users in containers to occasionally match another
user IDs on the host. And to allow those to access logs of foreign services.
This should be mitigated with SELinux policies.&lt;/p&gt;
&lt;p&gt;Future steps impact TBD.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;As the spec transitions from Pike, the section is split into the Pike and
Queens parts.&lt;/p&gt;
&lt;p&gt;Containerized and host services will be logging under different paths. The former
to the &lt;cite&gt;/var/log/containers/foo&lt;/cite&gt; and &lt;cite&gt;/var/log/pacemaker/bundles/*&lt;/cite&gt;, the latter
to the &lt;cite&gt;/var/log/foo&lt;/cite&gt;. This impacts logs collecting tools like
&lt;a class="reference external" href="https://github.com/sosreport/sos"&gt;sosreport&lt;/a&gt; et al.&lt;/p&gt;
&lt;p&gt;Future steps impact TBD.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;As the spec transitions from Pike, the section is split into the Pike and
Queens parts.&lt;/p&gt;
&lt;p&gt;Hostpath mounted volumes bring no performance overhead for containerized
services’ logs. Host services are not affected by the proposed change.&lt;/p&gt;
&lt;p&gt;Future steps impact is that handling of the byte stream of stdout can
have a significant impact on performance.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;As the spec transitions from Pike, the section is split into the Pike and
Queens parts.&lt;/p&gt;
&lt;p&gt;When upgrading from Ocata to Pike, containerized services will change its
logging destination directory as described in the end user impact section.
This also impacts logs collecting tools like sosreport et al.&lt;/p&gt;
&lt;p&gt;Logrotate scripts must be adjusted for the &lt;cite&gt;/var/log/containers&lt;/cite&gt; and
&lt;cite&gt;/var/log/pacemaker/bundles/*&lt;/cite&gt; as well.&lt;/p&gt;
&lt;p&gt;Future steps impact TBD.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;As the spec transitions from Pike, the section is split into the Pike and
Queens parts.&lt;/p&gt;
&lt;p&gt;Developers will have to keep in mind the recommended intermediate best
practices, when designing heat templates for TripleO hybrid deployments.&lt;/p&gt;
&lt;p&gt;Developers will have to understand Kolla and Docker runtime internals, although
that’s already the case once we have containerized services onboard.&lt;/p&gt;
&lt;p&gt;Future steps impact (to be finished):&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The notion of Tracebacks in the events is difficult to handle as a byte
stream, because it becomes the responsibility of the apps to ensure output
of new-line separated text is not interleaved. That notion of Tracebacks
needs to be implemented apps side.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Oslo.log is really emitting a stream of event points, or trace points, with
rich metadata to describe those events. Capturing that metadata via a byte
stream later needs to be implemented.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Event streams of child processes, forked even temporarily, should or may need
to be captured by the parent events stream as well.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bogdando&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;michele
flaper87
larsks
dciabrin&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;As the spec transitions from Pike, the work items are split into the Pike and
Queens parts:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Implement an intermediate logging solution for tripleo-heat-templates for
containerized services that log under &lt;cite&gt;/var/log&lt;/cite&gt; (flaper87, bogdando). Done
for Pike.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Come up with an intermediate logging solution for containerized services that
log to syslog only (larsks). Done for Pike.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Come up with a solution for HA containerized services managed by Pacemaker
(michele). Done for Pike.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make sure that sosreport collects &lt;cite&gt;/var/log/containers/*&lt;/cite&gt; and
&lt;cite&gt;/var/log/pacemaker/bundles/*&lt;/cite&gt; (no assignee). Pending for Pike.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adjust logrotate scripts for the &lt;cite&gt;/var/log/containers&lt;/cite&gt; and
&lt;cite&gt;/var/log/pacemaker/bundles/*&lt;/cite&gt; paths (no assignee). Pending for Pike.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify if the namespaced &lt;cite&gt;/var/log/&lt;/cite&gt; for containers works and fits the case
(no assignee).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Address the current state of OpenStack infrastructure apps as they are, and
gently move them towards these guidelines referred as “future steps” (no
assignee).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Existing CI coverage fully fits the proposed change needs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The given best practices and intermediate solutions built from those do not
involve changes visible for end users but those given in the end users impact
section. The same is true for developers and dev docs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/sosreport/sos"&gt;Sosreport tool&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://lists.clusterlabs.org/pipermail/users/2017-April/005380.html"&gt;Pacemaker container bundles&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://success.docker.com/KBase/Introduction_to_User_Namespaces_in_Docker_Engine"&gt;User namespaces in docker&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.docker.com/engine/admin/logging/overview/"&gt;Docker logging drivers&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://blog.oddbit.com/2017/06/14/openstack-containers-and-logging/"&gt;Engineering blog posts&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Mon, 28 Aug 2017 00:00:00 </pubDate></item><item><title>GUI logging</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/gui-logging.html</link><description>
 
&lt;p&gt;The TripleO GUI currently has no way to persist logging information.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The TripleO GUI is a web application without its own dedicated backend.  As
such, any and all client-side errors are lost when the End User reloads the page
or navigates away from the application.  When things go wrong, the End User is
unable to retrieve client-side logs because this information is not persisted.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;I propose that we use Zaqar as a persistence backend for client-side logging.
At present, the web application is already communicating with Zaqar using
websockets.  We can use this connection to publish new messages to a dedicated
logging queue.&lt;/p&gt;
&lt;p&gt;Zaqar messages have a TTL of one hour.  So once every thirty minutes, Mistral
will query Zaqar using crontrigger, and retrieve all messages from the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-ui-logging&lt;/span&gt;&lt;/code&gt; queue.  Mistral will then look for a file called
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-ui-log&lt;/span&gt;&lt;/code&gt; in Swift.  If this file exists, Mistral will check its size.
If the size exceeds a predetermined size (e.g. 10MB), Mistral will rename it to
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-ui-log-&amp;lt;timestamp&amp;gt;&lt;/span&gt;&lt;/code&gt;, and create a new file in its place.  The file
will then receive the messages from Zaqar, one per line.  Once we reach, let’s
say, a hundred archives (about 1GB) we can start removing dropping data in order
to prevent unnecessary data accumulation.&lt;/p&gt;
&lt;p&gt;To view the logging data, we can ask Swift for 10 latest messages with a prefix
of &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-ui-log&lt;/span&gt;&lt;/code&gt;.  These files can be presented in the GUI for download.
Should the user require, we can present a “View more” link that will display the
rest of the collected files.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None at this time&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;There is a chance of logging sensitive data.  I propose that we apply some
common scrubbing mechanism to the messages before they are stored in Swift.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Sending additional messages over an existing websocket connection should have
a negligible performance impact on the web application.  Likewise, running
hourly cron tasks in Mistral shouldn’t impose a significant burden on the
undercloud machine.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers should also benefit from having a centralized logging system in
place as a means of improving productivity when debugging.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;hpokorny&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Introduce a central logging system (already in progress, see &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/websocket-logging"&gt;blueprint&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introduce a global error handler&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Convert all logging messages to JSON using a standard format&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configuration: the name for the Zaqar queue to carry the logging data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introduce a Mistral workflow to drain a Zaqar queue and publish the acquired
data to a file in Swift&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introduce GUI elements to download the log files&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We can write unit tests for the code that handles sending messages over the
websocket connection.  We might be able to write an integration smoke test that
will ensure that a message is received by the undercloud.  We can also add some
testing code to tripleo-common to cover the logic that drains the queue, and
publishes the log data to Swift.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We need to document the default name of the Zaqar queue, the maximum size of
each log file, and how many log files can be stored at most.  On the End User
side, we should document the fact that a GUI-oriented log is available, and the
way to get it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Mon, 28 Aug 2017 00:00:00 </pubDate></item><item><title>Tool send email with tripleo tempest results</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/send-mail-tool.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/send-mail-tool"&gt;https://blueprints.launchpad.net/tripleo/+spec/send-mail-tool&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To speed up the troubleshooting, debugging and reproducing TripleO tempest
results, we should have a list of people responsible to receive email status
about tempest failures, containing a list of all the failures and failures
that are known issues and are being covered by some opened bug in launchpad.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently there is periodic TripleO jobs running tempest, and these results
are not being verified whether is failing or passing.
Even if there is someone responsible to verify these runs, still is a manual
job go to logs web site, check what’s the latest job, go to the logs, verify
if tempest ran, list the number of failures, check against a list if these
failures are known failures or new ones, and only after all these steps,
start to work to identify the root cause of the problem.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;TripleO should provide a unified method for send email for a list of
users who would be responsible to take action when something goes wrong with
tempest results.
The method should run at the end of every run, in the validate-tempest role,
and read the log file, either by the output generated by tempest, or by the
logs uploaded to the logs website, identifying failures on tempest and report
it by mail, or save the mail content in a file to be verified later. The mail
should contain information such list of failures, list of known
failures, date, link to the logs of the run, and any other information that
might be relevant.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;One of the alternatives would be openstack-health, where the user can
subscribe into the rss feed of one of the jobs using a third party application.
Right now, openstack-health doesn’t support user subscription or send emails.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None, since it will use a API running in some cloud service to send the email,
so the username and password remain secure.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers in different teams will be more involved in TripleO CI debugging.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;arxcruz&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The script should be writen in Python&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Should be part of validate-tempest role in tripleo-quickstart-extras&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Should be able to read the logs in any run in &lt;a class="reference external" href="http://logs.openstack.org"&gt;http://logs.openstack.org&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once it reads the log, collect information about the failures,
passing and known failures or taking tempest output and parsing it directly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Be able to work with Jinja2 template to send email, so it’s
possible to have different templates for different types of job&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read the list of address that the report should be sent
* The list is a dictionary mapping the email address to a list of tests
and/or jobs where the users are interested.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Render the template with the proper data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Send the report&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;As part of CI testing, the new tool should be used to send a
report to a list of interested people&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation should be updated to reflect the standard ways
to send the report and call the script at the end of every
periodic run.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Sagi mail tempest:
&lt;a class="reference external" href="https://github.com/sshnaidm/various/blob/master/check_tests.py"&gt;https://github.com/sshnaidm/various/blob/master/check_tests.py&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Mon, 28 Aug 2017 00:00:00 </pubDate></item><item><title>Add real-time compute nodes to TripleO</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/tripleo-realtime.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-realtime"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-realtime&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Real-time guest VMs require compute nodes with a specific configuration to
control the sources of latency spikes.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Manual configuration of compute nodes to support real-time guests is possible.
However this is complex and time consuming where there is large number of
compute nodes to configure.&lt;/p&gt;
&lt;p&gt;On a real-time compute node a subset of the available physical CPUs (pCPUs) are
isolated and dedicated to real-time tasks. The remaining pCPUs are dedicated to
general housekeeping tasks. This requires a real-time Linux Kernel and real-time
KVM that allow their housekeeping tasks to be isolated. The real-time and
housekeeping pCPUs typically reside on different NUMA nodes.&lt;/p&gt;
&lt;p&gt;Huge pages are also reserved for guest VMs to prevent page faults, either via
the kernel command line or via sysfs. Sysfs is preferable as it allows the
reservation on each individual NUMA node to be set.&lt;/p&gt;
&lt;p&gt;A real-time Linux guest VM is partitioned in a similar manner, having one or
more real-time virtual CPUs (vCPUs) and one or more general vCPUs to handle
the non real-time housekeeping tasks.&lt;/p&gt;
&lt;p&gt;A real-time vCPU is pinned to a real-time pCPU while a housekeeping vCPU is
pinned to a housekeeping pCPUS.&lt;/p&gt;
&lt;p&gt;It is expected that operators would require both real-time and non real-time
compute nodes on the same overcloud.&lt;/p&gt;
&lt;section id="use-cases"&gt;
&lt;h3&gt;Use Cases&lt;/h3&gt;
&lt;p&gt;The primary use-case is NFV appliances deployed by the telco community which
require strict latency guarantees. Other latency sensitive applications should
also benefit.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;This spec proposes changes to automate the deployment of real-time capable
compute nodes using TripleO.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;a custom overcloud image for the real-time compute nodes, which shall include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;real-time Linux Kernel&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;real-time KVM&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;real-time tuned profiles&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;a new real-time compute role that is a variant of the existing compute role&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;huge pages shall be enabled on the real-time compute nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;huge pages shall be reserved for the real-time guests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU pinning shall be used to isolate kernel housekeeping tasks from the
real-time tasks by configuring tuned.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU pinning shall be used to isolate virtualization housekeeping tasks from
the real-time tasks by configuring nova.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Worse-case latency in real-time guest VMs should be significantly reduced.
However a real-time configuration potentially reduces the overall throughput of
a compute node.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;The operator will remain responsible for:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;appropriate BIOS settings on compute node.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;setting appropriate parameters for the real-time role in an environment file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;post-deployment configuration&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;creating/modifying overcloud flavors to enable CPU pinning, hugepages,
dedicated CPUs, real-time policy&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;creating host aggregates for real-time and non real-time compute nodes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Real-time &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud-full&lt;/span&gt;&lt;/code&gt; image creation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;create a disk-image-builder element to include the real-time packages&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;add support for multiple overcloud images in python-tripleoclient CLIs:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="n"&gt;build&lt;/span&gt;
&lt;span class="n"&gt;openstack&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="n"&gt;upload&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Real-time compute role:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;create a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ComputeRealtime&lt;/span&gt;&lt;/code&gt; role&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;variant of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Compute&lt;/span&gt;&lt;/code&gt; role that can be configued and scaled
independently&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;allows a different image and flavor to be used for real-time nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;includes any additional parameters/resources that apply to real-time nodes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;create a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;NovaRealtime&lt;/span&gt;&lt;/code&gt; service&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;contains a nested &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;NovaCompute&lt;/span&gt;&lt;/code&gt; service&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;allows parameters to be overridden for the real-time role only&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Nova configuration:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Nova &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vcpu_pin_set&lt;/span&gt;&lt;/code&gt; support is already implemented. See NovaVcpuPinSet in
&lt;a class="reference internal" href="#references"&gt;&lt;span class="std std-ref"&gt;References&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Kernel/system configuration:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;hugepages support&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;set default hugepage size (kernel cmdline)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;number of hugepages of each size to reserve at boot (kernel cmdline)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;number of hugepages of each size to reserve post boot on each NUMA node
(sysfs)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kernel CPU pinning&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;isolcpu option (kernel cmdline)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ideally this can be implemented outside of TripleO in the Tuned profiles, where
it is possible to set the kernel command line and manage sysfs. TripleO would
then manage the Tuned profile config files.
Alternatively the grub and systemd config files can be managed directly.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This requirement is shared with OVS-DPDK. The development should be
coordinated to ensure a single implementation is implemented for
both use-cases.
Managing the grub config via a UserData script is the current approach used
for OVS-DPDK. See OVS-DPDK documentation in &lt;a class="reference internal" href="#references"&gt;&lt;span class="std std-ref"&gt;References&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;owalsh&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;ansiwen&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;As outlined in the proposed changes.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Libvirt real time instances
&lt;a class="reference external" href="https://blueprints.launchpad.net/nova/+spec/libvirt-real-time"&gt;https://blueprints.launchpad.net/nova/+spec/libvirt-real-time&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hugepages enabled in the Compute nodes.
&lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1589929"&gt;https://bugs.launchpad.net/tripleo/+bug/1589929&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU isolation of real-time and non real-time tasks.
&lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1589930"&gt;https://bugs.launchpad.net/tripleo/+bug/1589930&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tuned
&lt;a class="reference external" href="https://fedorahosted.org/tuned/"&gt;https://fedorahosted.org/tuned/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Genuine real-time guests are unlikely to be testable in CI:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;specific BIOS settings are required.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;images with real-time Kernel and KVM modules are required&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However the workflow to deploy these guest should be testable in CI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Manual steps performed by the operator shall be documented:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;BIOS settings for low latency&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real-time overcloud image creation&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;CentOS repos do not include RT packages. The CERN CentOS RT repository is an
alternative.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Flavor and profile creation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parameters required in a TripleO environment file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Post-deployment configuration&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;span id="id1"/&gt;&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Nova blueprint &lt;a class="reference external" href="https://blueprints.launchpad.net/nova/+spec/libvirt-real-time"&gt;“Libvirt real time instances”&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The requirements are similar to &lt;a class="reference internal" href="../newton/tripleo-ovs-dpdk.html"&gt;&lt;span class="doc"&gt;Adding OVS-DPDK to Tripleo&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;CERN CentOS 7 RT repo &lt;a class="reference external" href="http://linuxsoft.cern.ch/cern/centos/7/rt/"&gt;http://linuxsoft.cern.ch/cern/centos/7/rt/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;NoveVcpuPinSet parameter added: &lt;a class="reference external" href="https://review.openstack.org/#/c/343770/"&gt;https://review.openstack.org/#/c/343770/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;OVS-DPDK documentation (work-in-progress): &lt;a class="reference external" href="https://review.openstack.org/#/c/395431/"&gt;https://review.openstack.org/#/c/395431/&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Mon, 28 Aug 2017 00:00:00 </pubDate></item><item><title>TripleO PTP (Precision Time Protocol) Support</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/queens/tripleo-ptp.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-ptp"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-ptp&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec introduces support for a time synchronization method called PTP [0]
which provides better time accuracy than NTP in general. With hardware
timestamping support on the host, PTP can achieve clock accuracy in the
sub-microsecond range, making it suitable for measurement and control systems.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently tripleo deploys NTP services by default which provide millisecond
level time accuracy, but this is not enough for some cases:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Fault/Error events will include timestamps placed on the associated event
messages, retrieved by detectors with the purpose of accurately identifying
the time that the event occurred. Given that the target Fault Management
cycle timelines are in tens of milliseconds on most critical faults, events
ordering may reverse against actual time if precison and accuracy of clock
synchronization are in the same level of accuracy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NFV C-RAN (Cloud Radio Access Network) is looking for better time
sychronization and distribution in micro-second level accuracy as alternative
for NTP, PTP has been evaluated as one of the technologies.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This spec is not intended to cover all the possible ways of PTP usage, rather
to provide a basic deployment path for PTP in tripleo with default
configuration set to support PTP Ordinary Clock (slave mode); the master mode
ptp clock configuration is not in the scope of this spec, but shall be deployed
by user to provide the time source for the PTP Ordinary Clock. The full support
of PTP capability can be enhanced further based on this spec.&lt;/p&gt;
&lt;p&gt;User shall be aware of the fact that NTP and PTP can not be configured together
on the same node without a coordinator program like timemaster which is also
provided by linuxptp package. How to configure and use timemaster is not in the
scope of this spec.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Provide the capability to configure PTP as time synchronization method:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add PTP configuration file path in overcloud resource registry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add puppet-tripleo profile for PTP services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add tripleo-heat-templates composable service for PTP.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Retain the current default behavior to deploy NTP as time synchronization
source:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The NTP services remain unchanged as the default time synchronization method.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The NTP services must be disabled on nodes where PTP are deployed.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The alternative is to continue to use NTP.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Security issues originated from PTP will need to be considered.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Users will get more accurate time from PTP.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;No impact with default deployment mode which uses NTP as time source.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;The operator who wants to use PTP should identify and provide the PTP capable
network interface name and make sure NTP is not deployed on the nodes where PTP
will be deployed. The default PTP network interface name is set to ‘nic1’ where
user should change it according to real interface name. By default, PTP will
not be deployed unless explicitly configured.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;zshi&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Puppet-tripleo profile for PTP services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tripleo-heat-templates composable service for PTP deployment&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Puppet module for PTP services: ptp [1]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The linuxptp RPM must be installed, and PTP capable NIC must be identified.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Refer to linuxptp project page [2] for the list of drivers that support the
PHC (Physical Hardware Clock) subsystem.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The deployment of PTP should be testable in CI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The deployment documation will need to be updated to cover the configuration of
PTP.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;[0] &lt;a class="reference external" href="https://standards.ieee.org/findstds/standard/1588-2008.html"&gt;https://standards.ieee.org/findstds/standard/1588-2008.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[1] &lt;a class="reference external" href="https://github.com/redhat-nfvpe/ptp"&gt;https://github.com/redhat-nfvpe/ptp&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[2] &lt;a class="reference external" href="http://linuxptp.sourceforge.net"&gt;http://linuxptp.sourceforge.net&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Thu, 13 Jul 2017 00:00:00 </pubDate></item><item><title>A unified tool for upgrading TripleO based deploments</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/tripleo-upgrade.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-upgrade"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-upgrade&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In order to avoid work duplication and automation code being out of sync with the
official documentation we would like to create a single repository hosting the upgrade
automation code that can be run on top of deployments done with various tools.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently automation code for TripleO upgrades is spread across several repositories
and it is tightly coupled with the framework being used for deployment, e.g. tripleo-
quickstart or Infrared.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Our proposal is to decouple the upgrade automation code and make it deployment tool
agnostic. This way it could be consumed in different scenarios such as CI, automated
or manual testing.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;For the previous releases the automation code has been hosted in diffrent repositories
such as tripleo-quickstart-extras, infrared or private repos. This is not convenient
as they all cover basically the same workflow so we are duplicating work. We would like
to avoid this and collaborate on a single repository.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;This tool allows the users to run the TripleO upgrade in an automated fashion or
semi-automatic by creating scripts for each upgrade step which can be later run manually
by the user.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;This tools helps developers by providing a quick way to run TripleO upgrades. This could
be useful when reproducing and debugging reported issues.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;matbu, mcornea&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create new repository in Openstack Git&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migrate repository with its history from &lt;a class="reference external" href="https://github.com/redhat-openstack/tripleo-upgrade"&gt;https://github.com/redhat-openstack/tripleo-upgrade&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ansible&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Thu, 22 Jun 2017 00:00:00 </pubDate></item><item><title>Container Healthchecks for TripleO Services</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/container-healthchecks.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/container-healthchecks"&gt;https://blueprints.launchpad.net/tripleo/+spec/container-healthchecks&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;An OpenStack deployment involves many services spread across many
hosts. It is important that we provide tooling and APIs that make it
as easy as possible to monitor this large, distributed environment.
The move to containerized services in the overcloud [1]
brings with it many opportunities, such as the ability to bundle
services with their associated health checks and provide a standard
API for assessing the health of the service.&lt;/p&gt;
&lt;p&gt;[1]: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/containerize-tripleo"&gt;https://blueprints.launchpad.net/tripleo/+spec/containerize-tripleo&lt;/a&gt;&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The people who are in the best position to develop appropriate health
checks for a service are generally those people responsible for
developing the service.  Unfortunately, the task of setting up
monitoring generally ends up in the hands of cloud operators or some
intermediary.&lt;/p&gt;
&lt;p&gt;I propose that we take advantage of the bundling offered by
containerized services and create a standard API with which an
operator can assess the health of a service.  This makes life easier
for the operator, who can now provide granular service monitoring
without requiring detailed knowledge about every service, and it
allows service developers to ensure that services are monitored
appropriately.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The Docker engine (since version 1.12), as well as most higher-level
orchestration frameworks, provide a standard mechanism for validating
the health of a container.  Docker itself provides the
&lt;a class="reference external" href="https://docs.docker.com/engine/reference/builder/#healthcheck"&gt;HEALTHCHECK&lt;/a&gt; directive, while Kubernetes has explicit
support for &lt;a class="reference external" href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/"&gt;liveness and readiness probes&lt;/a&gt;.  Both
mechanisms work by executing a defined command inside the container,
and using the result of that executing to determine whether or not the
container is “healthy”.&lt;/p&gt;
&lt;p&gt;I propose that we explicitly support these interfaces in containerized
TripleO services through the following means:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Include in every container a &lt;cite&gt;/openstack/healthcheck&lt;/cite&gt; command that
will check the health of the containerized service, exit with
status &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;0&lt;/span&gt;&lt;/code&gt; if the service is healthy or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;1&lt;/span&gt;&lt;/code&gt; if not, and provide
a message on &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;stdout&lt;/span&gt;&lt;/code&gt; describing the nature of the error.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Include in every Docker image an appropriate &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;HEALTHCHECK&lt;/span&gt;&lt;/code&gt;
directive to utilize the script:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;HEALTHCHECK&lt;/span&gt; &lt;span class="n"&gt;CMD&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;healthcheck&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If Kubernetes becomes a standard part of the TripleO deployment
process, we may be able to implement liveness or readiness probes
using the same script:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;livenessProbe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;healthcheck&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The alternative is the status quo: services do not provide a standard
healthcheck API, and service monitoring must be configured
individually by cloud operators.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Users can explicitly run the healthcheck script to immediately assess
the state of a service.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;This proposal will result in the periodic execution of tasks on the
overcloud hosts.  When designing health checks, service developers
should select appropriate check intervals such that there is minimal
operational overhead from the health checks.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers will need to determine how best to assess the health of a
service and provide the appropriate script to perform this check.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This requires that we implement &lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/containerize-tripleo-overcloud.html"&gt;containerize-tripleo-overcloud&lt;/a&gt;
blueprint.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;TripleO CI jobs should be updated to utilize the healthcheck API to
determine if services are running correctly.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Any documentation describing the process of containerizing a service
for TripleoO must be updated to describe the healthcheck API.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;N/A&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Wed, 17 May 2017 00:00:00 </pubDate></item><item><title>Sample Environment Generator</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/environment-generator.html</link><description>
 
&lt;p&gt;A common tool to generate sample Heat environment files would be beneficial
in two main ways:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Consistent formatting and details.  Every environment file would include
parameter descriptions, types, defaults, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ease of updating.  The parameters can be dynamically read from the templates
which allows the sample environments to be updated automatically when
parameters are added or changed.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently our sample environments are hand written, with no consistency in
terms of what is included.  Most do not include a description of what all
the parameters do, and almost none include the types of the parameters or the
default values for them.&lt;/p&gt;
&lt;p&gt;In addition, the environment files often get out of date because developers
have to remember to manually update them any time they make a change to the
parameters for a given feature or service.  This is tedious and error-prone.&lt;/p&gt;
&lt;p&gt;The lack of consistency in environment files is also a problem for the UI,
which wants to use details from environments to improve the user experience.
When environments are created manually, these details are likely to be missed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;A new tool, similar to the oslo.config generator, will allow us to eliminate
these problems.  It will take some basic information about the environment and
use the parameter definitions in the templates to generate the sample
environment file.&lt;/p&gt;
&lt;p&gt;The resulting environments should contain the following information:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Human-readable Title&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;parameter_defaults describing all the available parameters for the
environment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optional resource_registry with any necessary entries&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Initially the title and description will simply be comments, but eventually we
would like to get support for those fields into Heat itself so they can be
top-level keys.&lt;/p&gt;
&lt;p&gt;Ideally the tool would be able to update the capabilities map automatically as
well.  At some point there may be some refactoring done there to eliminate the
overlap, but during the transition period this will be useful.&lt;/p&gt;
&lt;p&gt;This is also a good opportunity to impose some organization on the environments
directory of tripleo-heat-templates.  Currently it is mostly a flat directory
that contains all of the possible environments.  It would be good to add
subdirectories that group related environments so they are easier to find.&lt;/p&gt;
&lt;p&gt;The non-generated environments will either be replaced by generated ones,
when that makes sense, or deprecated in favor of a generated environment.
In the latter case the old environments will be left for a cycle to allow
users transition time to the new environments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We could add more checks to the yaml-validate tool to ensure environment files
contain the required information, but this still requires more developer
time and doesn’t solve the maintenance problems as parameters change.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Users should get an improved deployment experience through more complete and
better documented sample environments.  Existing users who are referencing
the existing sample environments may need to switch to the new generated
environments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;No runtime performance impact.  Initial testing suggests that it may take a
non-trivial amount of time to generate all of the environments, but it’s not
something developers should have to do often.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;See End User Impact&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers will need to write an entry in the input file for the tool rather
than directly writing sample environments.  The input format of the tool will
be documented, so this should not be too difficult.&lt;/p&gt;
&lt;p&gt;When an existing environment is deprecated in favor of a generated one, a
release note should be written by the developer making the change in order to
communicate it to users.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bnemec&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jtomasek&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Update the proposed tool to reflect the latest design decisions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Convert existing environments to be generated&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;No immediate dependencies, but in the long run we would like to have some
added functionality from Heat to allow these environments to be more easily
consumed by the UI.  However, it was agreed at the PTG that we would proceed
with this work and make the Heat changes in parallel so we can get some of
the benefits of the change as soon as possible.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Any environments used in CI should be generated with the tool.  We will want
to add a job that exercises the tool as well, probably a job that ensures any
changes in the patch under test are reflected in the environment files.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will need to document the format of the input file.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/253638/"&gt;Initial proposed version of the tool&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-environment-generator"&gt;https://etherpad.openstack.org/p/tripleo-environment-generator&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 28 Mar 2017 00:00:00 </pubDate></item><item><title>Adding New CI Jobs</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/policy/adding-ci-jobs.html</link><description>
 
&lt;p&gt;New CI jobs need to be added following a specific process in order to ensure
they don’t block patches unnecessarily and that they aren’t ignored by
developers.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;We need to have a process for adding CI jobs that is not going to result
in a lot of spurious failures due to the new jobs.  Bogus CI results force
additional rechecks and reduce developer/reviewer confidence in the results.&lt;/p&gt;
&lt;p&gt;In addition, maintaining CI jobs is a non-trivial task, and each one we add
increases the load on the team.  Hopefully having a process that requires the
involvement of the new job’s proposer makes it clear that the person/team
adding the job has a responsibility to help maintain it.  CI is everyone’s
problem.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="policy"&gt;
&lt;h2&gt;Policy&lt;/h2&gt;
&lt;p&gt;The following steps should be completed in the order listed when adding a new
job:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Create an experimental job or hijack an existing job for a single Gerrit
change.  See the references section for details on how to add a new job.
This job should be passing before moving on to the next step.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify that the new job is providing a reasonable level of logging.  Not
too much, not too little.  Important logs, such as the OpenStack service
logs and basic system logs, are necessary to determine why jobs fail.
However, OpenStack Infra has to store the logs from an enormous number of
jobs, so it is also important to keep our log artifact sizes under control.
When in doubt, try to capture about the same amount of logs as the existing
jobs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Promote the job to check non-voting.  While the job should have been
passing prior to this, it most likely has not been run a significant number
of times, so the overall stability is still unknown.&lt;/p&gt;
&lt;p&gt;“Stable” in this case would be defined as not having significantly more
spurious failures than the ovb-ha job.  Due to the additional complexity of
an HA deployment, that job tends to fail for reasons unrelated to the patch
being tested more often than the other jobs.  We do not want to add any
jobs that are less stable.  Note that failures due to legitimate problems
being caught by the new job should not count against its stability.&lt;/p&gt;
&lt;div class="admonition important"&gt;
&lt;p class="admonition-title"&gt;Important&lt;/p&gt;
&lt;p&gt;Before adding OVB jobs to the check queue, even as
non-voting, please check with the CI admins to ensure there is enough
OVB capacity to run a large number of new jobs.  As of this writing,
the OVB cloud capacity is significantly more constrained than regular
OpenStack Infra.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A job should remain in this state until it has been proven stable over a
period of time.  A good rule of thumb would be that after a week of
stability the job can and should move to the next step.&lt;/p&gt;
&lt;div class="admonition important"&gt;
&lt;p class="admonition-title"&gt;Important&lt;/p&gt;
&lt;p&gt;Jobs should not remain non-voting indefinitely.  This causes
reviewers to ignore the results anyway, so the jobs become a waste of
resources.  Once a job is believed to be stable, it should be made
voting as soon as possible.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To assist with confirming the stability of a job, it should be added to the
&lt;a class="reference external" href="http://tripleo.org/cistatus.html"&gt;CI Status&lt;/a&gt; page at this point.  This
can actually be done at any time after the job is moved to the check queue,
but must be done before the job becomes voting.&lt;/p&gt;
&lt;p&gt;Additionally, contact Sagi Shnaidman (sshnaidm on IRC) to get the job
added to the &lt;a class="reference external" href="http://status-tripleoci.rhcloud.com/"&gt;Extended CI Status&lt;/a&gt;
page.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Send an e-mail to openstack-dev, tagged with [tripleo], that explains the
purpose of the new job and notifies people that it is about to be made
voting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make the job voting.  At this point there should be sufficient confidence
in the job that reviewers can trust the results and should not merge
anything which does not pass it.&lt;/p&gt;
&lt;p&gt;In addition, be aware that voting multinode jobs are also gating.  If the
job fails the patch cannot merge.  This means a broken job can block all
TripleO changes from merging.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keep an eye on the &lt;a class="reference external" href="http://tripleo.org/cistatus.html"&gt;CI Status&lt;/a&gt; page to
ensure the job keeps running smoothly.  If it starts to fail an unusual
amount, please investigate.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="alternatives-history"&gt;
&lt;h2&gt;Alternatives &amp;amp; History&lt;/h2&gt;
&lt;p&gt;Historically, a number of jobs have been added to the check queue when they
were completely broken.  This is bad and reduces developer and reviewer
confidence in the CI results.  It can also block TripleO changes from merging
if the broken job is gating.&lt;/p&gt;
&lt;p&gt;We also have a bad habit of leaving jobs in the non-voting state, which makes
them fairly worthless since reviewers will not respect the results.  Per
this policy, we should clean up all of the non-voting jobs by either moving
them back to experimental, or stabilizing them and making them voting.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="author-s"&gt;
&lt;h3&gt;Author(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary author:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bnemec&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="milestones"&gt;
&lt;h3&gt;Milestones&lt;/h3&gt;
&lt;p&gt;This policy would go into effect immediately.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;This policy is mostly targeted at new jobs, but we do have a number of
non-voting jobs that should be brought into compliance with it.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/infra/manual/"&gt;OpenStack Infra Manual&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.openstack.org/infra/manual/drivers.html#running-jobs-with-zuul"&gt;Adding a New Job&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="revision-history"&gt;
&lt;h2&gt;Revision History&lt;/h2&gt;
&lt;table class="docutils align-default" id="id2"&gt;
&lt;caption&gt;&lt;span class="caption-text"&gt;Revisions&lt;/span&gt;&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Release Name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;Pike&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Introduced&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0
Unported License.
&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Wed, 15 Mar 2017 00:00:00 </pubDate></item><item><title>Deployment Plan Management changes</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/deployment-plan-management.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/deployment-plan-management-refactor"&gt;https://blueprints.launchpad.net/tripleo/+spec/deployment-plan-management-refactor&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The goal of this work is to improve GUI and CLI interoperability by changing the way
deployment configuration is stored, making it more compact and simplify plan import
and export.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The problem is broadly described in mailing list discussion [1]. This spec is a result
of agreement achieved in that discussion.&lt;/p&gt;
&lt;p&gt;TripleO-Common library currently operates on Mistral environment for storing plan
configuration although not all data are stored there since there are additional files
which define plan configuration (roles_data.yaml, network_data.yaml, capabilities-map.yaml)
which are currently used by CLI to drive certain parts of deployment configuration.
This imposes a problem of synchronization of content of those files with Mistral
environment when plan is imported or exported.&lt;/p&gt;
&lt;p&gt;TripleO-Common needs to be able to provide means for roles and networks management.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;TripleO plan configuration data should be stored in single place rather than in multiple
(mistral environment + plan meta files stored in Swift container).&lt;/p&gt;
&lt;p&gt;TripleO-Common should move from using mistral environment to storing the information
in file (plan-environment.yaml) in Swift container so all plan configuration data
are stored in ‘meta’ files in Swift and tripleo-common provides API to perform operations
on this data.&lt;/p&gt;
&lt;p&gt;Plan meta files: capabilities-map.yaml, roles_data.yaml, network_data.yaml [3],
plan-environment.yaml&lt;/p&gt;
&lt;p&gt;Proposed plan-environment.yaml file structure:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;

&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="n"&gt;which&lt;/span&gt; &lt;span class="n"&gt;this&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="n"&gt;describes&lt;/span&gt;
&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="s1"&gt;'s usage and potential summary of features it provides&lt;/span&gt;
&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="n"&gt;environments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;puppet&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="n"&gt;parameter_defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;ControllerCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;passwords&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;TrovePassword&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"vEPKFbdpTeesCWRmtjgH4s7M8"&lt;/span&gt;
  &lt;span class="n"&gt;PankoPassword&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"qJJj3gTg8bTCkbtYtYVPtzcyz"&lt;/span&gt;
  &lt;span class="n"&gt;KeystoneCredential0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"Yeh1wPLUWz0kiugxifYU19qaf5FADDZU31dnno4gJns="&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This solution makes whole plan configuration stored in Swift container together with
rest of plan files, simplifies plan import/export functionality as no synchronization
is necessary between the Swift files and mistral environment. Plan configuration is
more straightforward and CLI/GUI interoperability is improved.&lt;/p&gt;
&lt;p&gt;Initially the plan configuration is going to be split into multiple ‘meta’ files
(plan-environment.yaml, capabilities-map.yaml, roles_data.yaml, network_data.yaml)
all stored in Swift container.
As a next step we can evaluate a solution which merges them all into plan-environment.yaml&lt;/p&gt;
&lt;p&gt;Using CLI workflow user works with local files. Plan, Networks and Roles are configured by
making changes directly in relevant files (plan-management.yaml, roles_data.yaml, …).
Plan is created and templates are generated on deploy command.&lt;/p&gt;
&lt;p&gt;TripleO Common library will implement CRUD actions for Roles and Networks
management. This will allow clients to manage Roles and Networks and generate relevant
templates (see work items).&lt;/p&gt;
&lt;p&gt;TripleO UI and other clients use tripleo-common library which operates on plan stored in
Swift container.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Alternative approach is treating Swift ‘meta’ files as an input during plan creation
and synchronize them to Mistral environment when plan is imported which is described
initially in [1] and is used in current plan import/export implementation [2]&lt;/p&gt;
&lt;p&gt;This solution needs to deal with multiple race conditions, makes plan import/export
much more complicated and overall solution is not simple to understand. Using this
solution should be considered if using mistral environment as a plan configuration
storage has some marginal benefits over using file in Swift. Which is not the case
according to the discussion [1]&lt;/p&gt;
&lt;p&gt;As a subsequent step to proposed solution, it is possible to join all existing
‘meta’ files into a single one.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;CLI/GUI interoperability is improved&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;This change makes Deployment Plan import/export functionality much simpler as well as
makes the tripleo-common operate on the same set of files as CLI does. It is much
easier to understand the CLI users how tripleo-common works as it does not do any
swift files -&amp;gt; mistral environment synchronization on the background.&lt;/p&gt;
&lt;p&gt;TripleO-Common can introduce functionality manage Roles and Networks which perfectly
matches to how CLI workflow does it.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;akrivoka&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;d0ugal&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;rbrady&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;jtomasek&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;[tripleo-heat-templates] Update plan-environment.yaml to match new specification.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/update-plan-environment-yaml"&gt;https://blueprints.launchpad.net/tripleo/+spec/update-plan-environment-yaml&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[tripleo-common] Update relevant actions to store data in plan-environment.yaml in
Swift instead of using mistral-environment. Migrate any existing data away from Mistral.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/stop-using-mistral-env"&gt;https://blueprints.launchpad.net/tripleo/+spec/stop-using-mistral-env&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[tripleo-common] On plan creation/update tripleo-common validates the plan and checks
that roles_data.yaml and network_data.yaml exist as well as validates it’s format.
On success, plan creation/update templates are generated/regenerated.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/validate-roles-networks"&gt;https://blueprints.launchpad.net/tripleo/+spec/validate-roles-networks&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[tripleo-common] Provide a GetRoles action to list current roles in json format by reading
roles_data.yaml.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/get-roles-action"&gt;https://blueprints.launchpad.net/tripleo/+spec/get-roles-action&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[tripleo-common] Provide a GetNetworks action to list current networks in json format
by reading network_data.yaml.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/get-networks-action"&gt;https://blueprints.launchpad.net/tripleo/+spec/get-networks-action&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[tripleo-common] Provide an UpdateRoles action to update Roles. It takes data in
json format validates it’s contents and persists them in roles_data.yaml, after
successful update, templates are regenerated.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/update-roles-action"&gt;https://blueprints.launchpad.net/tripleo/+spec/update-roles-action&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[tripleo-common] Provide an UpdateNetworks action to update Networks. It takes data in
json format validates it’s contents and persists them in network_data.yaml.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/update-networks-action"&gt;https://blueprints.launchpad.net/tripleo/+spec/update-networks-action&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[tripleo-ui] Provide a way to create/list/update/delete Roles by calling tripleo-common
actions.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/roles-crud-ui"&gt;https://blueprints.launchpad.net/tripleo/+spec/roles-crud-ui&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[tripleo-ui] Provide a way to create/list/update/delete Networks by calling tripleo-common
actions.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/networks-crud-ui"&gt;https://blueprints.launchpad.net/tripleo/+spec/networks-crud-ui&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[tripleo-ui] Provide a way to assign Networks to Roles.&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/networks-roles-assignment-ui"&gt;https://blueprints.launchpad.net/tripleo/+spec/networks-roles-assignment-ui&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[python-tripleoclient] Update CLI to use tripleo-common actions for operations
that currently modify mistral environment&lt;/p&gt;
&lt;p&gt;related bug: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1635409"&gt;https://bugs.launchpad.net/tripleo/+bug/1635409&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Feature will be tested as part of TripleO CI&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation should be updated to reflect the new capabilities of GUI (Roles/Networks management),
a way to use plan-environment.yaml via CLI workflow and CLI/GUI interoperability using plan import
and export features.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2017-February/111433.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2017-February/111433.html&lt;/a&gt;
[2] &lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/gui-plan-import-export.html"&gt;https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/gui-plan-import-export.html&lt;/a&gt;
[3] &lt;a class="reference external" href="https://review.openstack.org/#/c/409921/"&gt;https://review.openstack.org/#/c/409921/&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 28 Feb 2017 00:00:00 </pubDate></item><item><title>AIDE - Intrustion Detection Database</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/pike/aide-database.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-aide-database"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-aide-database&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;AIDE (Advanced Intrusion Detection Environment) is a file and directory
integrity verification system. It computes a checksum of object
attributes, which are then stored into a database. Operators can then
run periodic checks against the current state of defined objects and
verify if any attributes have been changed (thereby suggesting possible
malicious / unauthorised tampering).&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Security Frameworks such as DISA STIG [1] / CIS [3] require that AIDE be
installed and configured on all Linux systems.&lt;/p&gt;
&lt;p&gt;To enable OpenStack operators to comply with the aforementioned security
requirements, they require a method of automating the installation of
AIDE and initialization of AIDE’s integrity Database. They also require
a means to perform a periodic integrity verification run.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Introduce a puppet-module to manage the AIDE service and ensure the AIDE
application is installed, create rule entries and a CRON job to allow
a periodic check of the AIDE database or templates to allow monitoring
via Sensu checks as part of OpTools.&lt;/p&gt;
&lt;p&gt;Create a tripleo-heat-template service to allow population of hiera data
to be consumed by the puppet-module managing AIDE.&lt;/p&gt;
&lt;p&gt;The proposed puppet-module is lhinds-aide [2] as this module will accept
rules declared in hiera data, initialize the Database and enables CRON
entries. Other puppet AIDE modules were missing hiera functionality or
other features (such as CRON population).&lt;/p&gt;
&lt;p&gt;Within tripleo-heat-templates, a composable service will be created to
feed a rule hash into the AIDE puppet module as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;AIDERules:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;description: Mapping of AIDE config rules
type: json
default: {}&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The Operator can then source an environment file and provide rule
information as a hash:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;parameter_defaults:&lt;/dt&gt;&lt;dd&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;AIDERules:&lt;/dt&gt;&lt;dd&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;‘Monitor /etc for changes’:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;content: ‘/etc p+sha256’
order  : 1&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;‘Monitor /boot for changes’:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;content: ‘/boot p+u+g+a’
order  : 2&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="ops-tool-integration"&gt;
&lt;h3&gt;Ops Tool Integration&lt;/h3&gt;
&lt;p&gt;In order to allow active monitoring of AIDE events, a sensu check can
be created to perform an interval based verification of AIDE monitored
files (set using &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;AIDERules&lt;/span&gt;&lt;/code&gt;) against the last initialized database.&lt;/p&gt;
&lt;p&gt;Results of the Sensu activated AIDE verification checks will then be fed
to the sensu server for alerting and archiving.&lt;/p&gt;
&lt;p&gt;The Sensu clients (all overcloud nodes) will be configured with a
standalone/passive check via puppet-sensu module which is already
installed on overcloud image.&lt;/p&gt;
&lt;p&gt;If the Operator should choose not to use OpTools, then they can still
configure AIDE using the traditional method by means of a CRON entry.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Using a puppet-module coupled with a TripleO service is the most
pragmatic approach to populating AIDE rules and managing the AIDE
service.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;AIDE is an integrity checking application and therefore requires
Operators insure the security of AIDE’s database is protected from
tampering. Should an attacker get access to the database, they could
attempt to hide malicious activity by removing records of file integrity
hashes.&lt;/p&gt;
&lt;p&gt;The default location is currently &lt;cite&gt;/var/lib/aide/$database&lt;/cite&gt; which
puppet-aide sets with privileges of &lt;cite&gt;0600&lt;/cite&gt; and ownership of
&lt;cite&gt;root root&lt;/cite&gt;.&lt;/p&gt;
&lt;p&gt;AIDE itself introduces no security impact to any OpenStack projects
and has no interaction with any OpenStack services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The service interaction will occur via heat templates and the TripleO
UI (should a capability map be present).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;No Performance Impact&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;The service will be utlised by means of an environment file. Therefore,
should a deployer not reference the environment template using the
&lt;cite&gt;openstack overcloud deploy -e&lt;/cite&gt; flag, there will be no impact.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;No impact on other OpenStack Developers.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;lhinds&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Add puppet-aide [1] to RDO as a puppet package&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create TripleO Service for AIDE&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create Capability Map&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create CI Job&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Submit documentation to tripleo-docs.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Dependency on lhinds-aide Puppet Module.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Will be tested in TripleO CI by adding the service and an environment
template to a TripleO CI scenario.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation patches will be made to explain how to use the service.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Original Launchpad issue: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1665031"&gt;https://bugs.launchpad.net/tripleo/+bug/1665031&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://www.stigviewer.com/stig/red_hat_enterprise_linux_6/2016-07-22/finding/V-38489"&gt;https://www.stigviewer.com/stig/red_hat_enterprise_linux_6/2016-07-22/finding/V-38489&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[2] &lt;a class="reference external" href="https://forge.puppet.com/lhinds/aide"&gt;https://forge.puppet.com/lhinds/aide&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[3]
&lt;a class="reference external" href="file:///home/luke/project-files/tripleo-security-hardening/CIS_Red_Hat_Enterprise_Linux_7_Benchmark_v2.1.0.pdf"&gt;file:///home/luke/project-files/tripleo-security-hardening/CIS_Red_Hat_Enterprise_Linux_7_Benchmark_v2.1.0.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[3]
&lt;a class="reference external" href="file:///home/luke/project-files/tripleo-security-hardening/CIS_Red_Hat_Enterprise_Linux_7_Benchmark_v2.1.0.pdf"&gt;file:///home/luke/project-files/tripleo-security-hardening/CIS_Red_Hat_Enterprise_Linux_7_Benchmark_v2.1.0.pdf&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Fri, 24 Feb 2017 00:00:00 </pubDate></item><item><title>Enable deployment of performace monitoring</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/tripleo-opstools-performance-monitoring.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-opstools-performance-monitoring"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-opstools-performance-monitoring&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;TripleO should have a possibility to automatically setup and install
the performance monitoring agent (collectd) to service the overcloud.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;We need to easily enable operators to connect overcloud nodes to performance
monitoring stack. The possible way to do so is to install collectd agent
together with set of plugins, depending on a metrics we want to collect
from overcloud nodes.&lt;/p&gt;
&lt;p&gt;Summary of use cases:&lt;/p&gt;
&lt;p&gt;1. collectd deployed on each overcloud node reporting configured metrics
(via collectd plugins) to external collector.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The collectd service will be deployed as a composable service on
the overcloud stack when it is explicitly stated via environment file.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Metric collection and transport to the monitoring node can create I/O which
might have performance impact on monitored nodes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Lars Kellogg-Stedman (larsks)&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Martin Magr (mmagr)&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;puppet-tripleo profile for collectd service&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates composable service for collectd deployment&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Puppet module for collectd service: puppet-collectd [1]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CentOS Opstools SIG repo [2]&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We should consider creating CI job for deploying overcloud with monitoring
node to perform functional testing.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;New template parameters will have to be documented.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://github.com/voxpupuli/puppet-collectd"&gt;https://github.com/voxpupuli/puppet-collectd&lt;/a&gt;
[2] &lt;a class="reference external" href="https://wiki.centos.org/SpecialInterestGroup/OpsTools"&gt;https://wiki.centos.org/SpecialInterestGroup/OpsTools&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Wed, 14 Dec 2016 00:00:00 </pubDate></item><item><title>Spec Review Process</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/policy/spec-review.html</link><description>
 
&lt;p&gt;Document the existing process to help reviewers, especially newcomers,
understand how to review specs. This is migrating the existing wiki
documentation into a policy.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Care should be taken when approving specs. An approved spec, and an
associated blueprint, indicate that the proposed change has some
priority for the TripleO project. We don’t want a bunch of approved
specs sitting out there that no community members are owning or working
on. We also want to make sure that our specs and blueprints are easy to
understand and have sufficient enough detail to effectively communicate
the intent of the change. The more effective the communication, the
more likely we are to elicit meaningful feedback from the wider
community.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="policy"&gt;
&lt;h2&gt;Policy&lt;/h2&gt;
&lt;p&gt;To this end, we should be cognizant of the following checklist when
reviewing and approving specs.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Broad feedback from interested parties.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;We should do our best to elicit feedback from operators,
non-TripleO developers, end users, and the wider OpenStack
community in general.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mail the appropriate lists, such as opentack-operators and
openstack-dev to ask for feedback. Respond to feedback on the list,
but also encourage direct comments on the spec itself, as those
will be easier for other spec reviewers to find.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Overall consensus&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Check for a general consensus in the spec.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Do reviewers agree this change is meaningful for TripleO?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If they don’t have a vested interest in the change, are they at
least not objecting to the change?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Review older patchsets to make sure everything has been addressed&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Have any reviewers raised objections in previous patchsets that
were not addressed?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Have any potential pitfalls been pointed out that have not been
addressed?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Impact/Security&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Ensure that the various Impact (end user, deployer, etc) and
Security sections in the spec have some content.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;These aren’t sections to just gloss off over after understanding
the implementation and proposed change. They are actually the most
important sections.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It would be nice if that content had elicited some feedback. If it
didn’t, that’s probably a good sign that the author and/or
reviewers have not yet thought about these sections carefully.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ease of understandability&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The spec should be easy to understand for those reviewers who are
familiar with the project. While the implementation may contain
technical details that not everyone will grasp, the overall
proposed change should be able to be understood by folks generally
familiar with TripleO. Someone who is generally familiar with
TripleO is likely someone who has run through the undercloud
install, perhaps contributed some code, or participated in reviews.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To aid in comprehension, grammar nits should generally be corrected
when they have been pointed out. Be aware though that even nits can
cause disagreements, as folks pointing out nits may be wrong
themselves. Do not bikeshed over solving disagreements on nits.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implementation&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Does the implementation make sense?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Are there alternative implementations, perhaps easier ones, and if
so, have those been listed in the Alternatives section?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Are reasons for discounting the Alternatives listed in the spec?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ownership&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Is the spec author the primary assignee?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If not, has the primary assignee reviewed the spec, or at least
commented that they agree that they are the primary assignee?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reviewer workload&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Specs turn into patches to codebases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A +2 on a spec means that the core reviewer intends to review the
patches associated with that spec in addition to their other core
commitments for reviewer workload.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A +1 on a spec from a core reviewer indicates that the core
reviewer is not necessarily committing to review that spec’s
patches.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It’s fine to +2 even if the spec also relates to other repositories
and areas of expertise, in addition to the reviewer’s own. We
probably would not want to merge any spec that spanned multiple
specialties without a representative from each group adding their
+2.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Have any additional (perhaps non-core) reviewers volunteered to
review patches that implement the spec?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There should be a sufficient number of core reviewers who have
volunteered to go above and beyond their typical reviewer workload
(indicated by their +2) to review the relevant patches. A
“sufficient number” is dependent on the individual spec and the
scope of the change.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If reviewers have said they’ll be reviewing a spec’s patches
instead of patches they’d review otherwise, that doesn’t help much
and is actually harmful to the overall project.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives-history"&gt;
&lt;h2&gt;Alternatives &amp;amp; History&lt;/h2&gt;
&lt;p&gt;This is migrating the already agreed upon policy from the wiki.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="author-s"&gt;
&lt;h3&gt;Author(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary author:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;james-slagle (from the wiki history)&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jpichon&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="milestones"&gt;
&lt;h3&gt;Milestones&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Once the policy has merged, an email should be sent to openstack-dev
referring to this document.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Original documentation: &lt;a class="reference external" href="https://wiki.openstack.org/wiki/TripleO/SpecReviews"&gt;https://wiki.openstack.org/wiki/TripleO/SpecReviews&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="revision-history"&gt;
&lt;h2&gt;Revision History&lt;/h2&gt;
&lt;table class="docutils align-default" id="id1"&gt;
&lt;caption&gt;&lt;span class="caption-text"&gt;Revisions&lt;/span&gt;&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class="row-odd"&gt;&lt;th class="head"&gt;&lt;p&gt;Release Name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Description&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class="row-even"&gt;&lt;td&gt;&lt;p&gt;Ocata&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;Migrated from wiki&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0
Unported License.
&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
</description><pubDate>Tue, 06 Dec 2016 00:00:00 </pubDate></item><item><title>TripleO Repo Management Tool</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/tripleo-repos.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/tripleo-repos"&gt;https://blueprints.launchpad.net/tripleo/tripleo-repos&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Create a tool to handle the repo setup for TripleO&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The documented repo setup steps for TripleO are currently:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;3 curls&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;a sed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;a multi-line bash command&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;a yum install&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(optional) another yum install and sed command&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These steps are also implemented in multiple other places, which means every
time a change needs to be made it has to be done in at least three different
places. The stable branches also need slightly different commands which further
complicates the documentation.  They also need to appear in multiple places
in the docs (e.g. virt system setup, undercloud install, image build,
undercloud upgrade).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;My proposal is to abstract away the repo management steps into a standalone
tool.  This would essentially change the repo setup from the process
described above to something like:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;sudo&lt;/span&gt; &lt;span class="n"&gt;yum&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;repos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rpm&lt;/span&gt;
&lt;span class="n"&gt;sudo&lt;/span&gt; &lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;repos&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Historical note: The original proposal was called dlrn-repo because it was
dealing exclusively with dlrn repos.  Now that we’ve started to add more
repos like Ceph that are not from dlrn, that name doesn’t really make sense.&lt;/p&gt;
&lt;p&gt;This will mean that when repo setup changes are needed (which happen
periodically), they only need to be made in one place and will apply to both
developer and user environments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Use tripleo.sh’s repo setup.  However, tripleo.sh is not intended as a
user-facing tool.  It’s supposed to be a thin wrapper that essentially
implements the documented deployment commands.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The tool would need to make changes to the system’s repo setup and install
packages.  This is the same thing done by the documented commands today.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;This would be a new user-facing CLI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;No meaningful change&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Deployers would need to switch to this new method of configuring the
TripleO repos in their deployments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;There should be little to no developer impact because they are mostly using
other tools to set up their repos, and those tools should be converted to use
the new tool.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bnemec&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&amp;lt;launchpad-id or None&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Update the proposed tool to match the current repo setup&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Import code into gerrit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Package tool&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Publish the package somewhere easily accessible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update docs to use tool&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Convert existing developer tools to use this tool&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;NA&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;tripleo.sh would be converted to use this tool so it would be covered by
existing CI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation would be simplified.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Original proposal:
&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-June/097221.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2016-June/097221.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Current version of the tool:
&lt;a class="reference external" href="https://github.com/cybertron/dlrn-repo"&gt;https://github.com/cybertron/dlrn-repo&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 15 Nov 2016 00:00:00 </pubDate></item><item><title>GUI: Import/Export Deployment Plan</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/gui-plan-import-export.html</link><description>
 
&lt;p&gt;Add two features to TripleO UI:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Import a deployment plan with a Mistral environment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Export a deployment plan with a Mistral environment&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/gui-plan-import-export"&gt;https://blueprints.launchpad.net/tripleo/+spec/gui-plan-import-export&lt;/a&gt;&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Right now, the UI only supports simple plan creation. The user needs to upload
the plan files, make the environment selection and set the parameters. We want
to add a plan import feature which would allow the user to import the plan
together with a complete Mistral environment. This way the selection of the
environment and parameters would be stored and automatically imported, without
any need for manual configuration.&lt;/p&gt;
&lt;p&gt;Conversely, we want to allow the user to export a plan together with a Mistral
environment, using the UI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;In order to identify the Mistral environment when importing a plan, I propose
we use a JSON formatted file and name it ‘plan-environment.json’. This file
should be uploaded to the Swift container together with the rest of the
deployment plan files. The convention of calling the file with a fixed name is
enough for it to be detected. Once this file is detected by the tripleo-common
workflow handling the plan import, the workflow then creates (or updates) the
Mistral environment using the file’s contents. In order to avoid possible future
unintentional overwriting of environment, the workflow should delete this file
once it has created (or updated) the Mistral environment with its contents.&lt;/p&gt;
&lt;p&gt;Exporting the plan should consist of downloading all the plan files from the
swift container, adding the plan-environment.json, and packing it all up in
a tarball.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;One alternative is what we have now, i.e. making the user perform all the
environment configuration settings and parameter settings manually each time.
This is obviously very tedious and the user experience suffers greatly as a
result.&lt;/p&gt;
&lt;p&gt;The alternative to deleting the plan-environment.json file upon its
processing is to leave in the swift container and keep it in sync with all
the updates that might happen thereafter. This can get very complicated and is
entirely unnecessary, so deleting the file instead is a better choice.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The import and export features will only be triggered on demand (user clicks
on button, or similar), so they will have no performance impact on the rest
of the application.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;akrivoka&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jtomasek
d0ugal&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;tripleo-common: Enhance plan creation/update to consume plan-environment.json&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/enhance-plan-creation-with-plan-environment-json"&gt;https://blueprints.launchpad.net/tripleo/+spec/enhance-plan-creation-with-plan-environment-json&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-common: Add plan export workflow&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/plan-export-workflow"&gt;https://blueprints.launchpad.net/tripleo/+spec/plan-export-workflow&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;python-tripleoclient: Add plan export command&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/plan-export-command"&gt;https://blueprints.launchpad.net/tripleo/+spec/plan-export-command&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-ui: Integrate plan export into UI&lt;/p&gt;
&lt;p&gt;bluerpint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/plan-export-gui"&gt;https://blueprints.launchpad.net/tripleo/+spec/plan-export-gui&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: We don’t need any additional UI (neither GUI nor CLI) for plan import - the
existing GUI elements and CLI command for plan creation can be used for import
as well.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The changes should be covered by unit tests in tripleo-ui, tripleo-common and
python-tripleoclient.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;User documentation should be enhanced by adding instructions on how these two
features are to be used.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Mon, 14 Nov 2016 00:00:00 </pubDate></item><item><title>Enable deployment of alternative backends for oslo.messaging</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/om-dual-backends.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/om-dual-backends"&gt;https://blueprints.launchpad.net/tripleo/+spec/om-dual-backends&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This spec describes adding two functional capabilities to the messaging
services of an overcloud deployment. The first capability is to enable
the selection and configuration of separate messaging backends for
oslo.messaging RPC and Notification communications. The second
capability is to introduce support for a brokerless messaging backend
for oslo.messaging RPC communications via the AMQP 1.0 Apache
qpid-dispatch-router.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The oslo.messaging library supports the deployment of dual messaging system
backends. This enables alternative backends to be deployed for RPC and
Notification messaging communications. Users have identified the
constraints of using a store and forward (broker based) messaging system for RPC
communications and are seeking direct messaging (brokerless)
approaches to optimize the RPC messaging pattern. In addition to
operational challenges, emerging distributed cloud architectures
define requirements around peer-to-peer relationships and geo-locality
that can be addressed through intelligent messaging transport routing
capabilities such as is provided by the AMQP 1.0 qpid-dispatch-router.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Provide the capability to select and configure alternative
transport_url’s for oslo.messaging RPCs and Notifications across
overcloud OpenStack services.&lt;/p&gt;
&lt;p&gt;Retain the current default behavior to deploy the rabbitMQ server as
the messaging backend for both RPC and Notification communications.&lt;/p&gt;
&lt;p&gt;Introduce an alternative deployment of the qpid-dispatch-router as the
messaging backend for RPC communications.&lt;/p&gt;
&lt;p&gt;Utilize the oslo.messaging AMQP 1.0 driver for delivering RPC services
via the dispatch-router messaging backend.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The configuration of dual backends for oslo.messaging could be
performed post overcloud deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The end result of using the AMQP 1.0 dispatch-router as an alternative
messaging backend for oslo.messaging RPC communications should be the
same from a security standpoint. The driver/router solution provides
SSL and SASL support in parity to the current rabbitMQ server deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The configuration of the dual backends for RPC and Notification
messaging communications should be transparent to the operation of the OpenStack
services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Using a dispatch-router mesh topology rather than broker clustering
for messaging communications will have a positive impact on
performance and scalability by:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Directly expanding connection capacity&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Providing parallel communication flows across the mesh&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Increasing aggregate message transfer capacity&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improving resource utilization of messaging infrastructure&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;The deployment of the dispatch-router, however, will be new to
OpenStack operators. Operators will need to learn the
architectural differences as compared to a broker cluster
deployment. This will include capacity planning, monitoring,
troubleshooting and maintenance best practices.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Support for alternative oslo.messaging backends and deployment of
qpid-dispatch-router in addition to rabbitMQ should be implemented for
tripleo-quickstart.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignee:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;John Eckersberg &amp;lt;&lt;a class="reference external" href="mailto:jeckersb%40redhat.com"&gt;jeckersb&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Andy Smith &amp;lt;&lt;a class="reference external" href="mailto:ansmith%40redhat.com"&gt;ansmith&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Update overcloud templates for dual backends and dispatch-router service&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add dispatch-router packages to overcloud image elements&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add services template for dispatch-router&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update OpenStack services base templates to select and configure
transport_urls for RPC and Notification&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy dispatch-router for controller and compute for topology&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Test failure and recovery scenarios for dispatch-router&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="transport-configuration"&gt;
&lt;h3&gt;Transport Configuration&lt;/h3&gt;
&lt;p&gt;The oslo.messaging configuration options define a default and
additional notification transport_url. If the notification
transport_url is not specified, oslo.messaging will use the default
transport_url for both RPC and Notification messaging communications.&lt;/p&gt;
&lt;p&gt;The transport_url parameter is of the form:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="k"&gt;pass&lt;/span&gt;&lt;span class="nd"&gt;@host1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;[,&lt;/span&gt;&lt;span class="n"&gt;hostN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;porN&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;virtual_host&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Where the transport scheme specifies the RPC or Notification backend as
one of rabbit or amqp, etc. Oslo.messaging is deprecating the host,
port and auth configuration options. All drivers will get these
options via the transport_url.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Support for dual backends in and AMQP 1.0 driver integration
with the dispatch-router depends on oslo.messaging V5.10 or later.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;In order to test this in CI, an environment will be needed where dual
messaging system backends (e.g. rabbitMQ server and dispatch-router
server) are deployed. Any existing hardware configuration should be
appropriate for the dual backend deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The deployment documentation will need to be updated to cover the
configuration of dual messaging system backends and the use of the
dispatch-router. TripleO Heat template examples should also help with
deployments using dual backends.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;[1] &lt;a class="reference external" href="https://blueprints.launchpad.net/oslo.messaging/+spec/amqp-dispatch-router"&gt;https://blueprints.launchpad.net/oslo.messaging/+spec/amqp-dispatch-router&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[2] &lt;a class="reference external" href="http://qpid.apache.org/components/dispatch-router/"&gt;http://qpid.apache.org/components/dispatch-router/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[3] &lt;a class="reference external" href="http://docs.openstack.org/developer/oslo.messaging/AMQP1.0.html"&gt;http://docs.openstack.org/developer/oslo.messaging/AMQP1.0.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[4] &lt;a class="reference external" href="https://etherpad.openstack.org/p/ocata-oslo-consistent-mq-backends"&gt;https://etherpad.openstack.org/p/ocata-oslo-consistent-mq-backends&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[5] &lt;a class="reference external" href="https://github.com/openstack/puppet-qdr"&gt;https://github.com/openstack/puppet-qdr&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Fri, 11 Nov 2016 00:00:00 </pubDate></item><item><title>Add Support for Custom TripleO Validations</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/custom-validations.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/custom-validations"&gt;https://blueprints.launchpad.net/tripleo/+spec/custom-validations&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All validations are currently stored in a single directory. This makes
it inconvenient to try and write new validations, update from a remote
repository or to add an entirely new (perhaps private) source.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The deployer wants to develop and test their own validations in a
personal checkout without risking changes to the default ones.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The deployer wants to use a stable release of TripleO but consume
the latest validations because they are non-disruptive and check for
more stuff.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A third party has developed validations specific to their product
that they don’t want to or can’t include in the tripleo-validations
repository.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;We will store a default set of TripleO validations in a Swift container called
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-validations&lt;/span&gt;&lt;/code&gt;. These will be shared across all plans and are not
expected to be updated by the deployer. This container should be created on
initial undercloud deployment.&lt;/p&gt;
&lt;p&gt;We will provide a mechanism for deployers to add a custom set of validations
per deployment plan. These plan-specific validations will be stored in a
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;custom-validations&lt;/span&gt;&lt;/code&gt; subdirectory in the plan’s Swift container. Storing them
together with the plan makes sense as these validations can be specific to
particular deployment plan configuration, as well as makes the import/export
easier.&lt;/p&gt;
&lt;p&gt;Since custom validation will be stored as part of the plan, no additional
workflows/actions to perform CRUD operations for them will be necessary; we can
simply use the existing plan create/update for this purpose.&lt;/p&gt;
&lt;p&gt;The validation Mistral actions (e.g. &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;list&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;run_validation&lt;/span&gt;&lt;/code&gt;)
will need to be updated to take into account this new structure of
validations. They will need to look for validations in the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-validations&lt;/span&gt;&lt;/code&gt; Swift container (for default validations) and the
plan’s &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;custom-validations&lt;/span&gt;&lt;/code&gt; subdirectory (for custom validations), instead of
sourcing them from a directory on disk, as they are doing now.&lt;/p&gt;
&lt;p&gt;If a validation with the same name is found both in default in custom
validations, we will always pick the one stored in custom validations.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;As a further iteration, we can implement validations as per-service
tasks in standalone service Ansible roles. They can then be consumed
by tripleo-heat-templates service templates.&lt;/p&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Do nothing. The deployers can already bring in additional
validations, it’s just less convenient and potentially error-prone.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We could provide a know directory structure conceptually similar to
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;run-parts&lt;/span&gt;&lt;/code&gt; where the deployers could add their own validation
directories.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;In order to add their own validations, the deployer will need to
update the deployment plan by adding a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;custom-validations&lt;/span&gt;&lt;/code&gt; directory to it,
and making sure this directory contains the desired custom validations. The
plan update operation is already supported in the CLI and the UI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Since the validation sources will now be Swift containers, downloading
validations will potentially be necessary on each run. We will have to keep an
eye on this an potentially introduce caching if this turns out to be a problem.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Testing and developing new validations in both development and
production environments will be easier with this change.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignees:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;akrivoka&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;florianf&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Move to using Swift as default storage for tripleo-validations (&lt;a class="footnote-reference brackets" href="#id3" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;load_validations&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;find_validation&lt;/span&gt;&lt;/code&gt; functions to
read validations from all the sources specified in this document.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;In order to be able to implement this new functionality, we first need to have
the validations use Swift as the default storage. In other words, this spec
depends on the blueprint &lt;a class="footnote-reference brackets" href="#id3" id="id2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The changes will be unit-tested in all the tripleo repos that related
changes land in (tripleo-common, instack-undercloud, tripleo-heat-templates,
etc).&lt;/p&gt;
&lt;p&gt;We could also add a new CI scenario that would have a custom-validations
directory within a plan set up.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will need to document the format of the new custom-validations plan
subdirectory and the new behaviour this will introduce.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="id3" role="note"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class="backrefs"&gt;(&lt;a role="doc-backlink" href="#id1"&gt;1&lt;/a&gt;,&lt;a role="doc-backlink" href="#id2"&gt;2&lt;/a&gt;)&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/store-validations-in-swift"&gt;https://blueprints.launchpad.net/tripleo/+spec/store-validations-in-swift&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;
</description><pubDate>Fri, 04 Nov 2016 00:00:00 </pubDate></item><item><title>GUI Deployment configuration improvements</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/gui-deployment-configuration.html</link><description>
 
&lt;p&gt;TripleO UI deployment configuration is based on enabling environments provided by
deployment plan (tripleo-heat-templates) and letting user set parameter values.&lt;/p&gt;
&lt;p&gt;This spec proposes improvements to this approach.&lt;/p&gt;
&lt;p&gt;Blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/deployment-configuration-improvements"&gt;https://blueprints.launchpad.net/tripleo/+spec/deployment-configuration-improvements&lt;/a&gt;&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The general goal of TripleO UI is to guide user through the deployment
process and provide relevant information along the way, so user does not
have to search for a context in documentation or by analyzing TripleO templates.&lt;/p&gt;
&lt;p&gt;There is a set of problems identified with a current deployment configuration
solution. Resolving those problems should lead to improved user experience when
making deployment design decisions.&lt;/p&gt;
&lt;p&gt;The important information about the usage of environment and relevant parameters
is usually included as a comment in environment file itself. This is not consumable by GUI.
We currently use capabilities-map.yaml to define environment meta data to work
around this.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;As the number of environments is growing it is hard to keep capabilities-map.yaml
up to date. When certain environment is added, capabilities-map.yaml is usually
not updated by the same developer, which leads to inaccuracy in environment
description when added later.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The environments depend on each other and potentially collide when used together&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There are no means to list and let user set parameters relevant to certain
environment. These are currently listed as comments in environments - not
consumable by GUI (example: [1])&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There are not enough means to organize parameters coming as a result of
heat validate&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Not all parameters defined in tripleo-heat-templates have correct type set
and don’t include all relevant information that Hot Spec provides.
(constraints…)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Same parameters are defined in multiple templates in tripleo-heat-templates
but their definition differs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;List of parameters which are supposed to get auto-generated when value is not
provided by user are hard-coded in deployment workflow&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Propose environment metadata to track additional information about environment
directly as part of the file in Heat (partially in progress [2]). Similar concept is
already present in heat resources [3].
In the meantime update tripleo-common environment listing feature to read
environments and include environment metadata.&lt;/p&gt;
&lt;p&gt;Each TripleO environment file should define:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;human&lt;/span&gt; &lt;span class="n"&gt;readable&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt; &lt;span class="n"&gt;purpose&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="n"&gt;resource_registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;parameter_defaults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With the environment metadata in place, capabilities-map.yaml purpose would
simplify to defining grouping and dependencies among environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement environment parameter listing in TripleO UI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To organize parameters we should use ParameterGroups.
(related discussion: [4])&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make sure that same parameters are defined the same way across tripleo-heat-templates
There may be exceptions but in those cases it must be sure that two templates which
define same parameter differently won’t be used at the same time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update parameter definitions in TripleO templates, so the type actually matches
expected parameter value (e.g. ‘string’ vs ‘boolean’) This will result in correct
input type being used in GUI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define a custom constraint for parameters which are supposed to be auto-generated.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Potential alternatives to listing environment related parameters are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Use Parameter Groups to match template parameters to an environment. This
solution ties the template with an environment and clutters the template.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As the introduction of environment metadata depends on having this feature accepted
and implemented in Heat, alternative solution is to keep title and description in
capabilities map as we do now&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;No significant security impact&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Resolving mentioned problems greatly improves the TripleO UI workflow and
makes deployment configuration much more streamlined.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Described approach allows to introduce caching of Heat validation which is
currently the most expensive operation. Cache gets invalid only in case
when a deployment plan is updated or switched.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Same as End User Impact&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jtomasek&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;rbrady&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates: update environments to include metadata (label,
description), update parameter_defaults to include all parameters relevant
to the environment&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/update-environment-files-with-related-parameters"&gt;https://blueprints.launchpad.net/tripleo/+spec/update-environment-files-with-related-parameters&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates: update capabilities-map.yaml to map environment
grouping and dependencies&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/update-capabilities-map-to-map-environment-dependencies"&gt;https://blueprints.launchpad.net/tripleo/+spec/update-capabilities-map-to-map-environment-dependencies&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates: create parameter groups for deprecated and internal
parameters&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates: make sure that same parameters have the same definition&lt;/p&gt;
&lt;p&gt;bug: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1640243"&gt;https://bugs.launchpad.net/tripleo/+bug/1640243&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates: make sure type is properly set for all parameters&lt;/p&gt;
&lt;p&gt;bug: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1640248"&gt;https://bugs.launchpad.net/tripleo/+bug/1640248&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates: create custom constraint for autogenerated parameters&lt;/p&gt;
&lt;p&gt;bug: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1636987"&gt;https://bugs.launchpad.net/tripleo/+bug/1636987&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-common: update environments listing to combine capabilities map with
environment metadata&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/update-capabilities-map-to-map-environment-dependencies"&gt;https://blueprints.launchpad.net/tripleo/+spec/update-capabilities-map-to-map-environment-dependencies&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-ui: Environment parameters listing&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/get-environment-parameters"&gt;https://blueprints.launchpad.net/tripleo/+spec/get-environment-parameters&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-common: autogenerate values for parameters with custom constraint&lt;/p&gt;
&lt;p&gt;bug: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1636987"&gt;https://bugs.launchpad.net/tripleo/+bug/1636987&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-ui: update environment configuration to reflect API changes, provide means to display and configure environment parameters&lt;/p&gt;
&lt;p&gt;blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-ui-deployment-configuration"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-ui-deployment-configuration&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-ui: add client-side parameter validations based on parameter type
and constraints&lt;/p&gt;
&lt;p&gt;bugs: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1638523"&gt;https://bugs.launchpad.net/tripleo/+bug/1638523&lt;/a&gt;, &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1640463"&gt;https://bugs.launchpad.net/tripleo/+bug/1640463&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-ui: don’t show parameters included in deprecated and internal groups&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Heat Environment metadata discussion [2]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Heat Parameter Groups discussion [3]&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The changes should be covered by unit tests in tripleo-common and GUI&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Part of this effort should be proper documentation of how TripleO environments
as well as capabilities-map.yaml should be defined&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/b6a4bdc3e4db97785b930065260c713f6e70a4da/environments/storage-environment.yaml"&gt;https://github.com/openstack/tripleo-heat-templates/blob/b6a4bdc3e4db97785b930065260c713f6e70a4da/environments/storage-environment.yaml&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[2] &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-June/097178.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2016-June/097178.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[3] &lt;a class="reference external" href="http://docs.openstack.org/developer/heat/template_guide/hot_spec.html#resources-section"&gt;http://docs.openstack.org/developer/heat/template_guide/hot_spec.html#resources-section&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;[4] &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-August/102297.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2016-August/102297.html&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Thu, 03 Nov 2016 00:00:00 </pubDate></item><item><title>PKI management of the overcloud using Certmonger</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/ssl-certmonger.html</link><description>
 
&lt;p&gt;There is currently support for enabling SSL for the public endpoints of the
OpenStack services. However, certain use cases require the availability of SSL
everywhere. This spec proposes an approach to enable it.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Even though there is support for deploying both the overcloud and the
undercloud with TLS/SSL support for the public endpoints, there are deployments
that demand the usage of encrypted communications through all the interfaces.&lt;/p&gt;
&lt;p&gt;The current approach for deploying SSL in TripleO is to inject the needed
keys/certificates through Heat environment files; this requires the
pre-creation of those. While this approach works for the public-facing
services, as we attempt to secure the communication between different
services, and in different levels of the infrastructure, the amount of keys
and certificates grows. So, getting the deployer to generate all the
certificates and manage them will be quite cumbersome.&lt;/p&gt;
&lt;p&gt;On the other hand, TripleO is not meant to handle the PKI of the cloud. And
being the case that we will at some point need to enable the deployer to be
able to renew, revoke and keep track of the certificates and keys deployed in
the cloud, we are in need of a system with such capabilities.&lt;/p&gt;
&lt;p&gt;Instead of brewing an OpenStack-specific solution ourselves. I propose the
usage of already existing systems that will make this a lot easier.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The proposal is to start using certmonger[1] in the nodes of the overcloud to
interact with a CA for managing the certificates that are being used. With this
tool, we can request the fetching of the needed certificates for interfaces
such as the internal OpenStack endpoints, the database cluster and the message
broker for the cloud. Those certificates will in turn have automatic tracking,
and for cases where there is a certificate to identify the node, it could
even automatically request a renewal of the certificate when needed.&lt;/p&gt;
&lt;p&gt;Certmonger is already available in several distributions (both Red Hat or
Debian based) and has the capability of interacting with several CAs, so if the
operator already has a working one, they could use that. On the other hand,
certmonger has the mechanism for registering new CAs, and executing scripts
(which are customizable) to communicate with those CAs. Those scripts are
language independent. But for means of the open source community, a solution
such as FreeIPA[2] or Dogtag[3] could be used to act as a CA and handle the
certificates and keys for us. Note that it’s possible to write a plugin for
certmonger to communicate with Barbican or another CA, if that’s what we would
like to go for.&lt;/p&gt;
&lt;p&gt;In the FreeIPA case, this will require a full FreeIPA system running either on
another node in the cluster or in the undercloud in a container[4].&lt;/p&gt;
&lt;p&gt;For cases where the services are terminated by HAProxy, and the overcloud being
in an HA-deployment, the controller nodes will need to share a certificate that
HAProxy will present when accessed. In this case, the workflow will be as
following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Register the undercloud as a FreeIPA client. This configures the kerberos
environment and provides access control to the undercloud node.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get keytab (credentials) corresponding to the undercloud in order to access
FreeIPA, and be able to register nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a HAProxy service&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a certificate/key for that service&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Store the key in FreeIPA’s Vault.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create each of the controllers to be deployed as hosts in FreeIPA (Please
see note about this)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On each controller node get the certificate from service entry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fetch the key from the FreeIPA vault.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Set certmonger to keep track of the resulting certificates and
keys.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;While the process of creating each node beforehand could sound cumbersome,
this can be automated to increase usability. The proposed approach is to
have a nova micro-service that automatically registers the nodes from the
overcloud when they are created [5]. This hook will not only register the
node in the system, but will also inject an OTP which the node can use to
fetch the required credentials and get its corresponding certificate and
key. The aforementioned OTP is only used for enrollment. Once enrollment
has already taken place, certmonger can already be used to fetch
certificates from FreeIPA.&lt;/p&gt;
&lt;p&gt;However, even if this micro-service is not in place, we could pass the OTP
via the TripleO Heat Templates (in the overcloud deployment). So it is
possible to have the controllers fetching their keytab and subsequently
request their certificates even if we don’t have auto-enrollment in place.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Barbican could also be used instead of FreeIPA’s Vault. With the upside of
it being an already accepted OpenStack service. However, Barbican will also
need to have a backend, which might be Dogtag in our case, since having an
HSM for the CI will probably not be an option.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Now, for services such as the message broker, where an individual certificate
is required per-host, the process is much simpler, since the nodes will have
already been registered in FreeIPA and will be able to fetch their credentials.
Now we can just let certmonger do the work and request, and subsequently track
the appropriate certificates.&lt;/p&gt;
&lt;p&gt;Once the certificates and keys are present in the nodes, then we can let the
subsequent steps of the overcloud deployment process take place; So the
services will be configured to use those certificates and enable TLS where the
deployer specifies it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The alternative is to take the same approach as we did for the public
endpoints. Which is to simply inject the certificates and keys to the nodes.
That would have the downside that the certificates and keys will be pasted in
heat environment files. This will be problematic for services such as RabbitMQ,
where we are giving a list of nodes for communication, because to enable SSL in
it, we need to have a certificate per-node serving as a message broker.
In this case two approaches could be taken:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We will need to copy and paste each certificate and key that is needed for
each of the nodes. With the downside being how much text needs to be copied,
and the difficulty of keeping track of the certificates. On the other hand,
each time a node is removed or added, we need to make sure we remember to add
a certificate and a key for it in the environment file. So this becomes a
scaling and a usability issue too.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We could also give in an intermediate certificate, and let TripleO create the
certificates and keys per-service. However, even if this fixes the usability
issue, we still cannot keep track of the specific certificates and keys that
are being deployed in the cloud.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;This approach enables better security for the overcloud, as it not only eases
us to enable TLS everywhere (if desired) but it also helps us keep track and
manage our PKI. On the other hand, it enables other means of security, such as
mutual authentication. In the case of FreeIPA, we could let the nodes have
client certificates, and so they would be able to authenticate to the services
(as is possible with tools such as HAProxy or Galera/MySQL). However, this can
come as subsequent work of this.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;For doing this, the user will need to pass extra parameters to the overcloud
deployment, such as the CA information. In the case of FreeIPA, we will need to
pass the host and port, the kerberos realm, the kerberos principal of the
undercloud and the location of the keytab (the credentials) for the undercloud.&lt;/p&gt;
&lt;p&gt;However, this will be reflected in the documentation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Having SSL everywhere will degrade the performance of the overcloud overall, as
there will be some overhead in each call. However, this is a known issue and
this is why SSL everywhere is optional. It should only be enabled for deployers
that really need it.&lt;/p&gt;
&lt;p&gt;The usage of an external CA or FreeIPA shouldn’t impact the overcloud
performance, as the operations that it will be doing are not recurrent
operations (issuing, revoking or renewing certificates).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;If a deployer wants to enable SSL everywhere, they will need to have a working
CA for this to work. Or if they don’t they could install FreeIPA in a node.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Discuss things that will affect other developers working on OpenStack.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jaosorior&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Enable certmonger and the FreeIPA client tools in the overcloud image
elements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Include the host auto-join hook for nova in the undercloud installation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create nested templates that will be used in the existing places for the
NodeTLSData and NodeTLSCAData. These templates will do the certmonger
certificate fetching and tracking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure the OpenStack internal endpoints to use TLS and make this optional
through a heat environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure the Galera/MySQL cluster to use TLS and make this optional through
a heat environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure RabbitMQ to use TLS (which means having a certificate for each
node) and make this optional through a heat environment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a CI gate for SSL everywhere. This will include a FreeIPA installation
and it will enable SSL for all the services, ending in the running of a
pingtest. For the FreeIPA preparations, a script running before the overcloud
deployment will add the undercloud as a client, configure the appropriate
permissions for it and deploy a keytab so that it can use the nova hook.
Subsequently it will create a service for the OpenStack internal endpoints,
and the database, which it will use to create the needed certificates and
keys.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This requires the following bug to be fixed in Nova:
&lt;a class="reference external" href="https://bugs.launchpad.net/nova/+bug/1518321"&gt;https://bugs.launchpad.net/nova/+bug/1518321&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Also requires the packaging of the nova hook.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We will need to create a new gate in CI to test this.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The documentation on how to use an external CA and how to install and use
FreeIPA with TripleO needs to be created.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://fedorahosted.org/certmonger/"&gt;https://fedorahosted.org/certmonger/&lt;/a&gt;
[2] &lt;a class="reference external" href="http://www.freeipa.org/page/Main_Page"&gt;http://www.freeipa.org/page/Main_Page&lt;/a&gt;
[3] &lt;a class="reference external" href="http://pki.fedoraproject.org/wiki/PKI_Main_Page"&gt;http://pki.fedoraproject.org/wiki/PKI_Main_Page&lt;/a&gt;
[4] &lt;a class="reference external" href="http://www.freeipa.org/page/Docker"&gt;http://www.freeipa.org/page/Docker&lt;/a&gt;
[5] &lt;a class="reference external" href="https://github.com/richm/rdo-vm-factory/blob/use-centos/rdo-ipa-nova/novahooks.py"&gt;https://github.com/richm/rdo-vm-factory/blob/use-centos/rdo-ipa-nova/novahooks.py&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Wed, 02 Nov 2016 00:00:00 </pubDate></item><item><title>Validations in TripleO Workflows</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/validations-in-workflows.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/validations-in-workflows"&gt;https://blueprints.launchpad.net/tripleo/+spec/validations-in-workflows&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Newton release introduced TripleO validations – a set of
extendable checks that identify potential deployment issues early and
verify that the deployed OpenStack is set up properly. These
validations are automatically being run by the TripleO UI, but there
is no support for the command line workflow and they’re not being
exercised by our CI jobs either.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;When enabled, TripleO UI runs the validations at the appropriate phase
of the planning and deployment. This is done within the TripleO UI
codebase and therefore not available to python-tripleoclient or
the CI.&lt;/p&gt;
&lt;p&gt;The TripleO deployer can run the validations manually, but they need
to know at which point to do so and they will need to do it by calling
Mistral directly.&lt;/p&gt;
&lt;p&gt;This causes a disparity between the command line and GUI experience
and complicates the efforts to exercise the validations by the CI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Each validation already advertises where in the planning/deployment
process it should be run. This is under the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vars/metagata/groups&lt;/span&gt;&lt;/code&gt;
section. In addition, the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo.validations.v1.run_groups&lt;/span&gt;&lt;/code&gt;
Mistral workflow lets us run all validations belonging to a given
group.&lt;/p&gt;
&lt;p&gt;For each validation group (currently &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;pre-introspection&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;pre-deployment&lt;/span&gt;&lt;/code&gt;
and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;post-deployment&lt;/span&gt;&lt;/code&gt;) we will update the appropriate workflow in
tripleo-common to optionally call &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;run_groups&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Each of the workflows above will receive a new Mistral input called
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;run_validations&lt;/span&gt;&lt;/code&gt;. It will be a boolean value that indicates whether
the validations ought to be run as part of that workflow or not.&lt;/p&gt;
&lt;p&gt;To expose this functionality to the command line user, we will add an
option for enabling/disabling validations into python-tripleoclient
(which will set the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;run_validations&lt;/span&gt;&lt;/code&gt; Mistral input) and a way to
show the results of each validation to the screen output.&lt;/p&gt;
&lt;p&gt;When the validations are run, they will report their status to Zaqar
and any failures will block the deployment. The deployer can disable
validations if they wish to proceed despite failures.&lt;/p&gt;
&lt;p&gt;One unresolved question is the post-deployment validations. The Heat
stack create/update Mistral action is currently asynchronous and we
have no way of calling actions after the deployment has finished.
Unless we change that, the post-deployment validations may have to be
run manually (or via python-tripleoclient).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Document where to run each group and how and leave it at that. This
risks that the users already familiar with TripleO may miss the
validations or that they won’t bother.&lt;/p&gt;
&lt;p&gt;We would still need to find a way to run validations in a CI job,
though.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide subcommands to run validations (and groups of validations)
into python-tripleoclient and rely on people running them manually.&lt;/p&gt;
&lt;p&gt;This is similar to 1., but provides an easier way of running a
validation and getting its result.&lt;/p&gt;
&lt;p&gt;Note that this may be a useful addition even if with the proposal
outlined in this specification.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Do what the GUI does in python-tripleoclient, too. The client will
know when to run which validation and will report the results back.&lt;/p&gt;
&lt;p&gt;The drawback is that we’ll need to implement and maintain the same
set of rules in two different codebases and have no API to do them.
I.e. what the switch to Mistral is supposed to solve.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;We will need to modify python-tripleoclient to be able to display the
status of validations once they finished. TripleO UI already does this.&lt;/p&gt;
&lt;p&gt;The deployers may need to learn about the validations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Running a validation can take about a minute (this depends on the
nature of the validation, e.g. does it check a configuration file or
does it need to log in to all compute nodes).&lt;/p&gt;
&lt;p&gt;This may can be a concern if we run multiple validations at the same
time.&lt;/p&gt;
&lt;p&gt;We should be able to run the whole group in parallel. It’s possible
we’re already doing that, but this needs to be investigated.
Specifically, does &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;with-items&lt;/span&gt;&lt;/code&gt; run the tasks in sequence or in
parallel?&lt;/p&gt;
&lt;p&gt;There are also some options that would allow us to speed up the
running time of a validation itself, by using common ways of speeding
up Ansible playbooks in general:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Disabling the default “setup” task for validations that don’t need
it (this task gathers hardware and system information about the
target node and it takes some time)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using persistent SSH connections&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Making each validation task run independently (by default, Ansible
runs a task on all the nodes, waits for its completion everywhere
and then moves on to another task)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Each validation runs the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-ansible-inventory&lt;/span&gt;&lt;/code&gt; script which
gathers information about deployed servers and configuration from
Mistral and Heat. Running this script can be slow. When we run
multiple validations at the same time, we should generate the
inventory only once and cache the results.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since the validations are going to be optional, the deployer can
always choose not to run them. On the other hand, any slowdown should
ideally outweigh the time spent investigating failed deployments.&lt;/p&gt;
&lt;p&gt;We will also document the actual time difference. This information
should be readily available from our CI environments, but we should
also provide measurements on the bare metal.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Depending on whether the validations will be run by default or not,
the only impact should be an option that lets the deployer to run them
or not.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;The TripleO developers may need to learn about validations, where to
find them and how to change them.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;tsedovic&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;None&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Work items or tasks – break the feature up into the things that need to be
done to implement it. Those parts might end up being done by different people,
but we’re mostly trying to understand the timeline for implementation.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;run_validations&lt;/span&gt;&lt;/code&gt; input and call &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;run_groups&lt;/span&gt;&lt;/code&gt; from the
deployment and node registration workflows&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add an option to run the validations to python-tripleoclient&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Display the validations results with python-tripleoclient&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add or update a CI job to run the validations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add a CI job to tripleo-validations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;This should make the validations testable in CI. Ideally, we would
verify the expected success/failure for the known validations given
the CI environment. But having them go through the testing machinery
would be a good first step to ensure we don’t break anything.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will need to document the fact that we have validations, where they
live and when and how are they being run.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://docs.openstack.org/developer/tripleo-common/readme.html#validations"&gt;http://docs.openstack.org/developer/tripleo-common/readme.html#validations&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://git.openstack.org/cgit/openstack/tripleo-validations/"&gt;http://git.openstack.org/cgit/openstack/tripleo-validations/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://docs.openstack.org/developer/tripleo-validations/"&gt;http://docs.openstack.org/developer/tripleo-validations/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Wed, 02 Nov 2016 00:00:00 </pubDate></item><item><title>Composable Service Upgrades</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/tripleo-composable-upgrades.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades-per-service"&gt;https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades-per-service&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In the Newton release TripleO delivered a new capability to deploy arbitrary
custom &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/custom-roles"&gt;roles&lt;/a&gt; (groups of nodes) with a lot of flexibility of which services
are placed on which roles (using &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/78500bc2e606bd1f80e05d86bf7da4d1d27f77b1/roles_data.yaml"&gt;roles_data.yaml&lt;/a&gt;). This means we can no
longer make the same assumptions about a specific service running on a
particular role (e.g Controller).&lt;/p&gt;
&lt;p&gt;The current upgrades &lt;a class="reference external" href="http://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html"&gt;workflow&lt;/a&gt; is organised around the node role determining
the order in which that given node and services deployed therein are upgraded.
The workflow dictates “swifts”, before “controllers”, before “cinders”, before
“computes”, before “cephs”. The reasons for this ordering are beyond the scope
here and ultimately inconsequential, since the important point to note is
there is a hard coded relationship between a given service and a given node
with respect to upgrading that service (e.g. a script that upgrades all
services on “Compute” nodes).  For upgrades from Newton to Ocata we can no
longer make these assumptions about services being tied to a specific role,
so a more composable model is needed.&lt;/p&gt;
&lt;p&gt;Consensus after the initial discussion during the Ocata design summit &lt;a class="reference external" href="https://etherpad.openstack.org/p/ocata-tripleo-upgrades"&gt;session&lt;/a&gt;
was that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Re-engineering the upgrades workflow for Newton to Ocata is necessary
because ‘custom roles’&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We should start by moving the upgrades logic into the composable service
templates in the tripleo-heat-templates (i.e. into each service)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There is still a need for an over-arching workflow - albeit service
rather than role oriented.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It is TBD what will drive that workflow. We will use whatever will be
‘easier’ for a first iteration, especially given the Ocata development
time contraints.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;As explained in the introduction above, the current upgrades &lt;a class="reference external" href="http://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html"&gt;workflow&lt;/a&gt; can no
longer work for composable service deployments. Right now the upgrade scripts
are organised around and indeed targetted at specific nodes: the upgrade
script for &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_object_storage.sh"&gt;swifts&lt;/a&gt; is different to that for &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_compute.sh"&gt;computes&lt;/a&gt; or for controllers (split
across a &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh"&gt;number&lt;/a&gt; &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh"&gt;of&lt;/a&gt; &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_3.sh"&gt;steps&lt;/a&gt;) &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_block_storage.sh"&gt;cinders&lt;/a&gt; or &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_ceph_storage.sh"&gt;cephs&lt;/a&gt;. These scripts are invoked
as part of a worfklow where each step is either a heat stack update or
invocation of the &lt;a class="reference external" href="https://github.com/openstack/tripleo-common/blob/01b68d0b0cdbd0323b7f006fbda616c12cbf90af/scripts/upgrade-non-controller.sh"&gt;upgrade-non-controller.sh&lt;/a&gt; script to execute the node
specific upgrade script (delivered as one of the earlier steps in the workflow)
on non controllers.&lt;/p&gt;
&lt;p&gt;One way to handle this problem is to decompose the upgrades logic
from those monolithic per-node upgrade scripts into per-service upgrades logic.
This should live in the tripleo-heat-templates puppet &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/tree/master/puppet/services"&gt;services&lt;/a&gt; templates for
each service. For the upgrade of a give service we need to express:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;any pre-upgrade requirements (run a migration, stop a service, pin RPC)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;any post upgrade (migrations, service starts/reload config)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;any dependencies on other services (upgrade foo only after bar)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;If we organise the upgrade logic in this manner the idea is to gain the
flexibility to combine this dynamically into the new upgrades workflow.
Besides the per-service upgrades logic the worklow will also need to handle
and provide for any deployment wide upgrades related operations such as
unpin of the RPC version once all services are successfully running Ocata, or
upgrading of services that aren’t directly managed or configured by the
tripleo deployment (like openvswitch as just one example), or even the delivery
of a new kernel which will require a reboot on the given service node after
all services have been upgraded.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;The first step is to work out where to add upgrades related configuration to
each service in the tripleo-heat-templates &lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/tree/master/puppet/services"&gt;services&lt;/a&gt; templates. The exact
format will depend on what we end up using to drive the workflow. We could
include them in the &lt;em&gt;outputs&lt;/em&gt; as ‘upgrade_config’, like:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;role_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Role&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;Nova&lt;/span&gt; &lt;span class="n"&gt;Compute&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;nova_compute&lt;/span&gt;
      &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="n"&gt;upgrade_tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RPC&lt;/span&gt; &lt;span class="n"&gt;pin&lt;/span&gt; &lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;
          &lt;span class="n"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"crudini --set /etc/nova/nova.conf upgrade_levels compute $upgrade_level_nova_compute"&lt;/span&gt;
          &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;step1&lt;/span&gt;
        &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stop&lt;/span&gt; &lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;
          &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stopped&lt;/span&gt;
          &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;step2&lt;/span&gt;
        &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;update&lt;/span&gt; &lt;span class="n"&gt;heat&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;
          &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;manage&lt;/span&gt; &lt;span class="n"&gt;db_sync&lt;/span&gt;
          &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;step3&lt;/span&gt;
        &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;
          &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;openstack&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;nova&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;
          &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;step4&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The current proposal is for the upgrade snippets to be expressed in Ansible.
The initial focus will be to drive the upgrade via the existing tripleo
tooling, e.g heat applying ansible similar to how heat applies scripts for
the non composable implementation.  In future it may also be possible to
expose the per-role ansible playbooks to enable advanced operators to drive
the upgrade workflow directly, perhaps used in conjunction with the dynamic
inventory provided for tripleo validations.&lt;/p&gt;
&lt;p&gt;One other point of note that was brought up in the Ocata design summit
&lt;a class="reference external" href="https://etherpad.openstack.org/p/ocata-tripleo-upgrades"&gt;session&lt;/a&gt; and which should factor into the design here is that operators may
wish to run the upgrade in stages rather than all at once. It could still be
the case that the new workflow can differentiate between ‘controlplane’
vs ‘non-controlplane’ services. The operator could then upgrade controlplane
services as one stand-alone upgrade step and then later start to roll out the
upgrade of non-controlplane services.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;One alternative is to have a stand-alone upgrades workflow driven by ansible.
Some early work and prototyping was done as well as a (linked from the
Ocata design summit &lt;a class="reference external" href="https://etherpad.openstack.org/p/ocata-tripleo-upgrades"&gt;session&lt;/a&gt;). Ultimately the proposal was abandoned but it is
still possible that we will use ansible for the upgrade logic as described
above. We could also explore exposing the resulting ansible playbooks for
advanced operators to invoke as part of their own tooling.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Significant change in the tripleo upgrades workflow.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignee: shardy&lt;/p&gt;
&lt;p&gt;Other contributors: marios, emacchi, matbu, chem, lbezdick,&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Some prototyping by shardy at
“WIP prototyping composable upgrades with Heat+Ansible” at
&lt;a class="reference external" href="https://review.openstack.org/#/c/393448/"&gt;I39f5426cb9da0b40bec4a7a3a4a353f69319bdf9&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Decompose the upgrades logic into each service template in the tht&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Design a workflow that incorporates migrations, the per-service upgrade
scripts and any deployment wide upgrades operations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Decide how this workflow is to be invoked (mistral? puppet? bash?)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;profit!&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Hopefully we can use the soon to be added upgrades &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1583125"&gt;job&lt;/a&gt; to help with the
development and testing of this feature and obviously guard against changes
that break upgrades. Ideally we will expand that to include jobs for each of
the stable branches (upgrade M-&amp;gt;N and N-&amp;gt;O). The M-&amp;gt;N would exercise the
previous upgrades workflow whereas N-&amp;gt;O would be exercising the work developed
as part of this spec.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Tue, 01 Nov 2016 00:00:00 </pubDate></item><item><title>Tool to Capture Environment Status and Logs</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/capture-environment-status-and-logs.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/capture-environment-status-and-logs"&gt;https://blueprints.launchpad.net/tripleo/+spec/capture-environment-status-and-logs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To aid in troubleshooting, debugging, and reproducing issues we should create
or integrate with a tool that allows an operator or developer to collect and
generage a single bundle that provides the state and history of a deployed
environment.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently there is no single command that can be run via either the
tripleoclient or via the UI that will generage a single artifact to be used
to report issues when failures occur.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-quickstart"&gt;tripleo-quickstart&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/openstack-infra/tripleo-ci"&gt;tripleo-ci&lt;/a&gt; and operators collect the logs for bug
reports in different ways.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When a failure occurs, many different peices of information must be collected
to be able to understand where the failure occured. If the logs required are
not asked for, an operator may not know to what to provide for
troubleshooting.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;TripleO should provide a unified method for collecting status and logs from the
undercloud and overcloud nodes.  The tripleoclient should support executing a
workflow to run status and log collection processes via &lt;a class="reference external" href="https://github.com/sosreport/sos"&gt;sosreport&lt;/a&gt;. The output
of the &lt;a class="reference external" href="https://github.com/sosreport/sos"&gt;sosreport&lt;/a&gt; should be collected and bundled together in a single location.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Currently, various shell scripts and ansible tasks are used by the CI processes
to perform log collection. These scripts are not maintained in combination with
the core TripleO and may require additional artifacts that are not installed by
default with a TripleO environment.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-quickstart"&gt;tripleo-quickstart&lt;/a&gt; uses &lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-collect-logs"&gt;ansible-role-tripleo-collect-logs&lt;/a&gt; to collect logs.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack-infra/tripleo-ci"&gt;tripleo-ci&lt;/a&gt; uses bash scripts to collect the logs.&lt;/p&gt;
&lt;p&gt;Fuel uses &lt;a class="reference external" href="https://github.com/openstack/timmy"&gt;timmy&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The logs and status information may be considered sensitive information. The
process to trigger status and logs should require authentication. Additionally
we should provide a basic password protection mechanism for the bundle of logs
that is created by this process.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;alex-schultz&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Ensure OpenStack &lt;a class="reference external" href="https://github.com/sosreport/sos/tree/master/sos/plugins"&gt;sosreport plugins&lt;/a&gt; are current.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write a TripleO sosreport plugin.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write a &lt;a class="reference external" href="http://docs.openstack.org/developer/mistral/terminology/workflows.html"&gt;Mistral workflow&lt;/a&gt; to execute sosreport and collect artifacts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write &lt;a class="reference external" href="https://github.com/openstack/python-tripleoclient"&gt;python-tripleoclient&lt;/a&gt; integration to execute Mistral workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update documentation and CI scripts to leverage new collection method.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;As part of CI testing, the new tool should be used to collect environment logs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation should be updated to reflect the standard ways to collect the logs
using the tripleo client.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Mon, 31 Oct 2016 00:00:00 </pubDate></item><item><title>Instance High Availability</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/queens/instance-ha.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/instance-ha"&gt;https://blueprints.launchpad.net/tripleo/+spec/instance-ha&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A very often requested feature by operators and customers is to be able to
automatically resurrect VMs that were running on a compute node that failed (either
due to hardware failures, networking issues or general server problems).
Currently we have a downstream-only procedure which consists of many manual
steps to configure Instance HA:
&lt;a class="reference external" href="https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/paged/high-availability-for-compute-instances/chapter-1-overview"&gt;https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/paged/high-availability-for-compute-instances/chapter-1-overview&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;What we would like to implement here is basically an optional opt-in automatic
deployment of a cloud that has Instance HA support.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently if a compute node has a hardware failure or a kernel panic all the
instances that were running on the node, will be gone and manual intervention
is needed to resurrect these instances on another compute node.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The proposed change would be to add a few additional puppet-tripleo profiles that would help
us configure the pacemaker resources needed for instance HA. Unlike in previous iterations
we won’t need to move nova-compute resources under pacemaker’s management. We managed to
achieve the same result without touching the compute nodes (except by setting
up pacemaker_remote on the computes, but that support exists already)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;There are a few specs that are modeling host recovery:&lt;/p&gt;
&lt;p&gt;Host Recovery - &lt;a class="reference external" href="https://review.openstack.org/#/c/386554/"&gt;https://review.openstack.org/#/c/386554/&lt;/a&gt;
Instances auto evacuation - &lt;a class="reference external" href="https://review.openstack.org/#/c/257809"&gt;https://review.openstack.org/#/c/257809&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The first spec uses pacemaker in a very similar way but is too new
and too high level to really be able to comment at this point in time.
The second one has been stalled for a long time and it looks like there
is no consensus yet on the approaches needed. The longterm goal is
to morph the Instance HA deployment into the spec that gets accepted.
We are actively working on both specs as well. In any case we have
discussed the long-term plan with SuSe and NTT and we agreed
on a long-term plan of which this spec is the first step for TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;No additional security impact.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;End users are not impacted except for the fact that VMs can be resurrected
automatically on a non-failed compute node.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;There are no performance related impacts as compared to a current deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;So this change does not affect the default deployments. What it does it adds a boolean
and some additional profiles so that a deployer can have a cloud configured with Instance
HA support out of the box.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;One top-level parameter to enable the Instance HA deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Although fencing configuration is already currently supported by tripleo, we will need
to improve bits and pieces so that we won’t need an extra command to generate the
fencing parameters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upgrades will be impacted by this change in the sense that we will need to make sure to test
them when Instance HA is enabled.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;No developer impact is planned.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;michele&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;cmsj, abeekhof&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Make the fencing configuration fully automated (this is mostly done already, we need oooq integration
and some optimization)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the logic and needed resources on the control-plane&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Test the upgrade path when Instance HA is configured&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Testing this manually is fairly simple:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Deploy with Instance HA configured and two compute nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Spawn a test VM&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Crash the compute node where the VM is running&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Observe the VM being resurrected on the other compute node&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Testing this in CI is doable but might be a bit more challenging due to resource constraints.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;A section under advanced configuration is needed explaining the deployment of
a cloud that supports Instance HA.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/paged/high-availability-for-compute-instances/"&gt;https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/paged/high-availability-for-compute-instances/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Thu, 20 Oct 2016 00:00:00 </pubDate></item><item><title>Make tripleo third party ci toolset tripleo-quickstart</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/third-party-gating-with-tripleo-quickstart.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/use-tripleo-quickstart-and-tripleo-quickstart-extras-for-the-tripleo-ci-toolset"&gt;https://blueprints.launchpad.net/tripleo/+spec/use-tripleo-quickstart-and-tripleo-quickstart-extras-for-the-tripleo-ci-toolset&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Devstack being the reference CI deployment of OpenStack does a good job at
running both in CI and locally on development hardware.
TripleO-Quickstart (TQ)`[3]`_ and TripleO-QuickStart-Extras (TQE) can provide
an equivalent experience like devstack both in CI and on local development
hardware. TQE does a nice job of breaking down the steps required to install an
undercloud and deploy and overcloud step by step by creating bash scripts on the
developers system and then executing them in the correct order.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently there is a population of OpenStack developers that are unfamiliar
with TripleO and our TripleO CI tools. It’s critical that this population have
a tool which can provide a similar user experience that devstack currently
provides OpenStack developers.&lt;/p&gt;
&lt;p&gt;Recreating a deployment failure from TripleO-CI can be difficult for developers
outside of TripleO. Developers may need more than just a script that executes
a deployment. Ideally developers have a tool that provides a high level
overview, a step-by-step install process with documentation, and a way to inject
their local patches or patches from Gerrit into the build.&lt;/p&gt;
&lt;p&gt;Additionally there may be groups outside of TripleO that want to integrate
additional code or steps to a deployment. In this case the composablity of the
CI code is critical to allow others to plugin, extend and create their own steps
for a deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Replace the tools found in openstack-infra/tripleo-ci that drive the deployment
of tripleo with TQ and TQE.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;One alternative is to break down TripleO-CI into composable shell scripts, and
improve the user experience &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/make-tripleo-ci-externally-consumable"&gt;[4]&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;No known additional security vulnerabilities at this time.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;We expect that newcomers to TripleO will have an enhanced experience
reproducing results from CI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Using an undercloud image with preinstalled rpms should provide a faster
deployment end-to-end.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None at this time.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;This is the whole point really and discussed elsewhere in the spec. However,
this should provide a quality user experience for developers wishing to setup
TripleO.&lt;/p&gt;
&lt;p&gt;TQE provides a step-by-step, well documented deployment of TripleO.
Furthermore, and is easy to launch and configure:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;bash&lt;/span&gt; &lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sh&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;extras&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yml&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;extras&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;txt&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="nb"&gt;all&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;development&lt;/span&gt; &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Everything is executed via a bash shell script, the shell scripts are customized
via jinja2 templates. Users can see the command prior to executing it when
running it locally. Documentation of what commands were executed are
automatically generated per execution.&lt;/p&gt;
&lt;section id="node-registration-and-introspection-example"&gt;
&lt;h4&gt;Node registration and introspection example:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Bash script:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;centos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;rdo&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;jenkins&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;promote&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;newton&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;delorean&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;minimal&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;undercloud&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;prep&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sh&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Execution log:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;centos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;rdo&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;jenkins&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;promote&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;newton&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;delorean&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;minimal&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;undercloud&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;overcloud_prep_images&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generated rst documentation:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;centos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;rdo&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;jenkins&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;promote&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;newton&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;delorean&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;minimal&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;build&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;prep&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="overcloud-deployment-example"&gt;
&lt;h4&gt;Overcloud Deployment example:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Bash script:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;centos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;rdo&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;jenkins&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;promote&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;newton&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;delorean&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;minimal_pacemaker&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;undercloud&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;deploy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sh&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Execution log:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;centos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;rdo&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;jenkins&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;promote&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;newton&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;delorean&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;minimal_pacemaker&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;undercloud&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;overcloud_deploy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generated rST documentation:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;centos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;artifacts&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;rdo&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;jenkins&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;promote&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tripleo&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;delorean&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;minimal&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;37&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;build&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;deploy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="step-by-step-deployment"&gt;
&lt;h4&gt;Step by Step Deployment:&lt;/h4&gt;
&lt;p&gt;There are times when a developer will want to walk through a deployment step-by-step,
run commands by hand, and try to figure out what exactly is involved with
a deployment. A developer may also want to tweak the settings or add a patch.
To do the above the deployment can not just run through end to end.&lt;/p&gt;
&lt;p&gt;TQE can setup the undercloud and overcloud nodes, and then just add add already
configured scripts to install the undercloud and deploy the overcloud
successfully. Essentially allowing the developer to ssh to the undercloud and
drive the installation from there with prebuilt scripts.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="o"&gt;./&lt;/span&gt;&lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sh&lt;/span&gt;  &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;clone&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;bootstrap&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt; &lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;extras&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;txt&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;playbook&lt;/span&gt; &lt;span class="n"&gt;quickstart&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;extras&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yml&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;skip&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="n"&gt;undercloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;install&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;undercloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;install&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;deploy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;overcloud&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;validate&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;release&lt;/span&gt; &lt;span class="n"&gt;newton&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;development&lt;/span&gt; &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="composability"&gt;
&lt;h4&gt;Composability:&lt;/h4&gt;
&lt;p&gt;TQE is not a single tool, it’s a collection of composable Ansible roles. These
Ansible roles can coexist in a single Git repository or be distributed to many
Git repositories. See “Additional References.”&lt;/p&gt;
&lt;p&gt;Why have two projects? Why risk adding complexity?
One of the goals of the TQ and TQE is to not assume we are
writing code that works for everyone, on every deployment type, and in any
kind of infrastructure. To ensure that TQE developers can not block outside
contributions (roles, additions, and customization to either TQ or TQE),
it was thought best to uncouple as well and make it as composable
as possible. Ansible playbooks after all, are best used as a method to just
call roles so that anyone can create playbooks with a variety of roles in the
way that best suits their purpose.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;weshayutin&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;trown&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;sshnaidm&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gcerami&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;adarazs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;larks&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Enable third party testing &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-October/105248.html"&gt;[1]&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable TQE to run against the RH2 OVB OpenStack cloud &lt;a class="reference external" href="https://review.openstack.org/#/c/381094/"&gt;[2]&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move the TQE roles into one or many OpenStack Git Repositories, see the roles listed
in the “Additional References”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A decision needs to be made regarding &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-October/105248.html"&gt;[1]&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The work to enable third party testing in rdoproject needs to be completed&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;There is a work in progress testing TQE against the RH2 OVB cloud atm &lt;a class="reference external" href="https://review.openstack.org/#/c/381094/"&gt;[2]&lt;/a&gt;. TQE
has been vetted for quite some time with OVB on other clouds.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;What is the impact on the docs? Don’t repeat details discussed above, but
please reference them here.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-October/105248.html"&gt;[1]&lt;/a&gt; – &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-October/105248.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2016-October/105248.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/381094/"&gt;[2]&lt;/a&gt; – &lt;a class="reference external" href="https://review.openstack.org/#/c/381094/"&gt;https://review.openstack.org/#/c/381094/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-third-party-ci-quickstart"&gt;[3]&lt;/a&gt; – &lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-third-party-ci-quickstart"&gt;https://etherpad.openstack.org/p/tripleo-third-party-ci-quickstart&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/make-tripleo-ci-externally-consumable"&gt;[4]&lt;/a&gt; – &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/make-tripleo-ci-externally-consumable"&gt;https://blueprints.launchpad.net/tripleo/+spec/make-tripleo-ci-externally-consumable&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="additional-references"&gt;
&lt;h2&gt;Additional References&lt;/h2&gt;
&lt;section id="tqe-ansible-role-library"&gt;
&lt;h3&gt;TQE Ansible role library&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Undercloud roles:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-baremetal-virt-undercloud"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-baremetal-virt-undercloud&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-pre-deployment-validate"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-pre-deployment-validate&lt;/a&gt; ( under development )&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Overcloud roles:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-config"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-config&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-flavors"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-flavors&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-images"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-images&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-network"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-network&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-ssl"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-ssl&lt;/a&gt;  ( under development )&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Utility roles:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-cleanup-nfo"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-cleanup-nfo&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-collect-logs"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-collect-logs&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-gate"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-gate&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-provision-heat"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-provision-heat&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-image-build"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-image-build&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Post Deployment roles:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-upgrade"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-upgrade&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-scale-nodes"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-scale-nodes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-tempest"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-tempest&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-validate"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-validate&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-validate-ipmi"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-validate-ipmi&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-validate-ha"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-validate-ha&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Baremetal roles:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-baremetal-prep-virthost"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-baremetal-prep-virthost&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-baremetal"&gt;https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-baremetal&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
</description><pubDate>Thu, 13 Oct 2016 00:00:00 </pubDate></item><item><title>Puppet Module Deployment via Swift</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/puppet-modules-deployment-via-swift.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/puppet-modules-deployment-via-swift"&gt;https://blueprints.launchpad.net/tripleo/+spec/puppet-modules-deployment-via-swift&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The ability to deploy a local directory of puppet modules to an overcloud
using the OpenStack swift object service.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;When deploying puppet modules to the overcloud there are currently three&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;options:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;pre-install the puppet modules into a “golden” image. You can pre-install
modules via git sources or by using a distro package.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;use a “firstboot” script to rsync the modules from the undercloud (or
some other rsync server that is available).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;post-install the puppet modules via a package upgrade onto a running
Overcloud server by using a (RPM, Deb, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;None of the above mechanisms provides an easy workflow when making
minor (ad-hoc) changes to the puppet modules and only distro packages can be
used to provide updated puppet modules to an already deployed overcloud.
While we do have a way to rsync over updated modules on “firstboot” via
rsync this isn’t a useful mechanism for operator who may wish to
use heat stack-update to deploy puppet changes without having to build
a new RPM/Deb package for each revision.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Create an optional (opt-in) workflow that if enabled will allow an operator
to create and deploy a local artifact (tarball, distro package, etc.) of
puppet modules to a new or existing overcloud via heat stack-create and
stack-update.  The mechanism would use the OpenStack object store service
(rather than rsync) which we already have available on the undercloud.
The new workflow would work like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A puppet modules artifact (tarball, distro package, etc.) would be uploaded
into a swift container.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The container would be configured so that a Swift Temp URL can be generated&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A Swift Temp URL would be generated for the puppet modules URL that is
stored in swift&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A heat environment would be generated which sets a DeployArtifactURLs
parameter to this swift URL. (the parameter could be a list so that
multiple URLs could also be downloaded.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The TripleO Heat Templates would be modified so that they include a new
‘script’ step which if it detects a custom DeployArtifactURLs parameter
would automatically download the artifact from the provided URL, and
deploy it locally on each overcloud role during the deployment workflow.
By “deploy locally” we mean a tarball would be extracted, and RPM would
get installed, etc. The actual deployment mechanism will be pluggable
such that both tarballs and distro packages will be supported and future
additions might be added as well so long as they also fit into the generic
DeployArtifactURLs abstraction.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Operator could then use the generated heat environment to deploy
a new set of puppet modules via heat stack-create or heat stack-update.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TripleO client could be modified so that it automically loads
generated heat environments in a convienent location. This (optional)
extra step would make enabling the above workflow transparent and
only require the operator to run a ‘upload-puppet-modules’ tool to
upload and configure new puppet modules for deployment via Swift.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;There are many alternatives we could use to obtain a similar workflow that
allows the operator to more deploy puppet modules from a local directory:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Setting up a puppet master would allow a similar workflow. The downside
of this approach is that it would require a bit of overhead, and it
is puppet specific (the deployment mechanism would need to be re-worked
if we ever had other types of on-disk files to update).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rsync. We already support rsync for firstboot scripts. The downside of
rsync is it requires extra setup, and doesn’t have an API like
OpenStack swift does allowing for local or remote management and updates
to the puppet modules.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The new deployment would use a Swift Temp URL over HTTP/HTTPS. The duration
of the Swift Temp URL’s can be controlled when they are signed via
swift-temp-url if extra security is desired. By using a Swift Temp URL we
avoid the need to pass the administrators credentials onto each overcloud
node for swiftclient and instead can simply use curl (or wget) to download
the updated puppet modules. Given we already deploy images over http/https
using an undercloud the use of Swift in this manner should pose minimal extra
security risks.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The ability to deploy puppet modules via Swift will be opt-in so the
impact on end users would be minimal. The heat templates will contain
a new script deployment that may take a few extra seconds to deploy on
each node (even if the feature is not enabled). We could avoid the extra
deployment time perhaps by noop’ing out the heat resource for the new
swift puppet module deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Developers and Operators would likely be able to deploy puppet module changes
more quickly (without having to create a distro package). The actual deployment
of puppet modules via swift (downloading and extracting the tarball) would
likely be just as fast as a tarball.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Being able to more easily deploy updated puppet modules to an overcloud would
likely speed up the development update and testing cycle of puppet modules.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;dan-prince&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create an upload-puppet-modules script in tripleo-common. Initially this
may be a bash script which we ultimately refine into a Python version if
it proves useful.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify tripleo-heat-templates so that it supports a DeployArtifactURLs
parameter (if the parameter is set) attempt to deploy the list of
files from this parameter. The actual contents of the file might be
a tarball or a distribution package (RPM).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify tripleoclient so that the workflow around using upload-puppet-modules
can be “transparent”. Simply running upload-puppet-modules would not only
upload the puppet modules it would also generate a Heat environment that
would then automatically configure heat stack-update/create commands
to use the new URL via a custom heat environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update our CI scripts in tripleo-ci and/or tripleo-common so that we
make use of the new Puppet modules deployment mechanism.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update tripleo-docs to make note of the new feature.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We would likely want to switch to use this feature in our CI because
it allows us to avoid git cloning the same puppet modules for both
the undercloud and overcloud nodes. Simply calling the extra
upload-puppet-modules script on the undercloud as part of our
deployment workflow would enable the feature and allow it to be tested.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We would need to document the additional (optional) workflow associated
with deploying puppet modules via Swift.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/245314/"&gt;https://review.openstack.org/#/c/245314/&lt;/a&gt; (Add support for DeployArtifactURLs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/245310/"&gt;https://review.openstack.org/#/c/245310/&lt;/a&gt; (Add scripts/upload-swift-artifacts)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/245172/"&gt;https://review.openstack.org/#/c/245172/&lt;/a&gt; (tripleoclient –environment)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
</description><pubDate>Tue, 04 Oct 2016 00:00:00 </pubDate></item><item><title>Deploying TripleO in Containers</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/containerize-tripleo-overcloud.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/containerize-tripleo"&gt;https://blueprints.launchpad.net/tripleo/+spec/containerize-tripleo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Ability to deploy TripleO in Containers.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Linux containers are changing how the industry deploys applications by offering
a lightweight, portable and upgradeable alternative to deployments on a physical
host or virtual machine.&lt;/p&gt;
&lt;p&gt;Since TripleO already manages OpenStack infrastructure by using OpenStack
itself, containers could be a new approach to deploy OpenStack services. It
would change the deployment workflow but could extend upgrade capabilities,
orchestration, and security management.&lt;/p&gt;
&lt;p&gt;Benefits of containerizing the openstack services include:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Upgrades can be performed by swapping out containers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Since the entire software stack is held within the container,
interdependencies do not affect deployments of services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Containers define explicit state and data requirements. Ultimately if we
moved to kubernetes all volumes would be off the host making the host
stateless.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Easy rollback to working containers if upgrading fails.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Software shipped in each container has been proven to work for this service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mix and match versions of services on the same host.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Immutable containers provide a consistent environment upon startup.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The intention of this blueprint is to introduce containers as a method of
delivering an OpenStack installation. We currently have a fully functioning
containerized version of the compute node but we would like to extend this to
all services. In addition it should work with the new composable roles work that
has been recently added.&lt;/p&gt;
&lt;p&gt;The idea is to have an interface within the heat templates that adds information
for each service to be started as a container. This container format should
closely resemble the Kubernetes definition so we can possibly transition to
Kubernetes in the future. This work has already been started here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/330659/"&gt;https://review.openstack.org/#/c/330659/&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;There are some technology choices that have been made to keep things usable and
practical. These include:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Using Kolla containers. Kolla containers are built using the most popular
operating system choices including CentOS, RHEL, Ubuntu, etc. and are a
good fit for our use case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We are using a heat hook to start these containers directly via docker.
This minimizes the software required on the node and maps directly to the
current baremetal implementation. Also maintaining the heat interface
keeps the GUI functional and allows heat to drive upgrades and changes to
containers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Changing the format of container deployment to match Kubernetes for
potential future use of this technology.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using CentOS in full (not CentOS Atomic) on the nodes to allow users to
have a usable system for debugging.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Puppet driven configuration that is mounted into the container at startup.
This allows us to retain our puppet configuration system and operate in
parallel with existing baremetal deployment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="bootstrapping"&gt;
&lt;h3&gt;Bootstrapping&lt;/h3&gt;
&lt;p&gt;Once the node is up and running, there is a systemd service script that runs
which starts the docker agents container. This container has all of the
components needed to bootstrap the system. This includes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;heat agents including os-collect-config, os-apply-config etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;puppet-agent and modules needed for the configuration of the deployment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;docker client that connects to host docker daemon.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;environment for configuring networking on the host.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;This containers acts as a self-installing container. Once started, this
container will use os-collect-config to connect back to heat. The heat agents
then perform the following tasks:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Set up an etc directory and runs puppet configuration scripts. This
generates all the config files needed by the services in the same manner
it would if run without containers. These are copied into a directory
accessible on the host and by all containerized services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Begin starting containerized services and other steps as defined in the
heat template.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Currently all containers are implemented using net=host to allow the services to
listen directly on the host network(s). This maintains functionality in terms of
network isolation and IPv6.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;There shouldn’t be major security impacts from this change. The deployment
shouldn’t be affected negatively by this change from a security standpoint but
unknown issues might be found. SELinux support is implemented in Docker.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="end-user-impact"&gt;
&lt;h3&gt;End User Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Debugging of containerized services will be different as it will require
knowledge about docker (kubernetes in the future) and other tools to access
the information from the containers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Possibly provide more options for upgrades and new versions of services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It’ll allow for service isolation and better dependency management&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Very little impact:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Runtime performance should remain the same.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We are noticing a slightly longer bootstrapping time with containers but that
should be fixable with a few easy optimizations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="deployer-impact"&gt;
&lt;h3&gt;Deployer Impact&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;From a deployment perspective very little changes:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Deployment workflow remains the same.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There may be more options for versions of different services since we do
not need to worry about interdependency issues with the software stack.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="upgrade-impact"&gt;
&lt;h3&gt;Upgrade Impact&lt;/h3&gt;
&lt;p&gt;This work aims to allow for resilent, transparent upgrades from baremetal
overcloud deployments to container based ones.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Initially we need to transition to containers:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Would require node reboots.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automated upgrades should be possible as services are the same, just
containerized.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some state may be moved off nodes to centralized storage. Containers very
clearly define required data and state storage requirements.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;dt&gt;Upgrades could be made easier:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Individual services can be upgraded because of reduced interdependencies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It is easier to roll back to a previous version of a service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Explicit storage of data and state for containers makes it very clear what
needs to be preserved. Ultimately state information and data will likely
not exist on individual nodes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;The developer work flow changes slighly. Instead of interacting with the service
via systemd and log files, you will interact with the service via docker.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Inside the compute node:&lt;/dt&gt;&lt;dd&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;sudo docker ps -a&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;sudo docker logs &amp;lt;container-name&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;sudo docker exec -it &amp;lt;container-name&amp;gt; /bin/bash&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Assignee(s)&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;rhallisey
imain
flaper87
mandre&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;dprince
emilienm&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Heat Docker hook that starts containers (DONE)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Containerized Compute (DONE)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TripleO CI job (INCOMPLETE - &lt;a class="reference external" href="https://review.openstack.org/#/c/288915/"&gt;https://review.openstack.org/#/c/288915/&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Containerized Controller&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automatically build containers for OpenStack services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Containerized Undercloud&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Composable roles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Heat template interface which allows extensions to support containerized
service definitions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;TripleO CI would need a new Jenkins job that will deploy an overcloud in
containers by using the selected solution.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/openstack/tripleo-heat-templates/blob/master/docker/README-containers.md"&gt;https://github.com/openstack/tripleo-heat-templates/blob/master/docker/README-containers.md&lt;/a&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Deploying TripleO in containers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Debugging TripleO containers&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.docker.com/misc/"&gt;https://docs.docker.com/misc/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-docker-puppet"&gt;https://etherpad.openstack.org/p/tripleo-docker-puppet&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.docker.com/articles/security/"&gt;https://docs.docker.com/articles/security/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://docs.openstack.org/developer/kolla/"&gt;http://docs.openstack.org/developer/kolla/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/209505/"&gt;https://review.openstack.org/#/c/209505/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/227295/"&gt;https://review.openstack.org/#/c/227295/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Tue, 27 Sep 2016 00:00:00 </pubDate></item><item><title>Composable HA architecture</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/composable-ha-architecture.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/composable-ha"&gt;https://blueprints.launchpad.net/tripleo/+spec/composable-ha&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Since Newton, we have the following services managed by pacemaker:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Cloned and master/slave resources:
galera, redis, haproxy, rabbitmq&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Active/Passive resources:
VIPs, cinder-volume, cinder-backup, manila-share&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is currently not possible to compose the above service in the same
way like we do today via composable roles for the non-pacemaker services
This spec aims to address this limitation and let the operator be more flexible
in the composition of the control plane.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently tripleo has implemented no logic whatsoever to assign specific pacemaker
managed services to roles/nodes.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Since we do not have a lot in terms of hard performance data, we typically support
three controller nodes. This is perceived as a scalability limiting factor and there is
a general desire to be able to assign specific nodes to specific pacemaker-managed
services (e.g. three nodes only for galera, five nodes only for rabbitmq)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Right now if the operator deploys on N controllers he will get N cloned instances
of the non-A/P pacemaker services on the same N nodes. We want to be able to
be much more flexible. E.g. deploy galera on the first 3 nodes, rabbitmq on the
remaining 5 nodes, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It is also desirable for the operator to be able to choose on which nodes the A/P
resources will run.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We also currently have a scalability limit of 16 nodes for the pacemaker cluster.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The proposal here is to keep the existing cluster in its current form, but to extend
it in two ways:
A) Allow the operator to include a specific service in a custom node and have pacemaker
run that resource only on that node. E.g. the operator can define the following custom nodes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Node A
pacemaker
galera&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Node B
pacemaker
rabbitmq&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Node C
pacemaker
VIPs, cinder-volume, cinder-backup, manila-share, redis, haproxy&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With the above definition the operator can instantiate any number of A, B or C nodes
and scale up to a total of 16 nodes. Pacemaker will place the resources only on
the appropriate nodes.&lt;/p&gt;
&lt;p&gt;B) Allow the operator to extend the cluster beyond 16 nodes via pacemaker remote.
For example an operator could define the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Node A
pacemaker
galera
rabbitmq&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Node B
pacemaker-remote
redis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Node C
pacemaker-remote
VIPs, cinder-volume, cinder-backup, manila-share, redis, haproxy&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This second scenario would allow an operator to extend beyond the 16 nodes limit.
The only difference to scenario 1) is the fact that the quorum of the cluster is
obtained only by the nodes from Node A.&lt;/p&gt;
&lt;p&gt;The way this would work is that the placement on nodes would be controllerd by location
rules that would work based on node properties matching.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;A bunch of alternative designs was discussed and evaluated:
A) A cluster per service:&lt;/p&gt;
&lt;p&gt;One possible architecture would be to create a separate pacemaker cluster for
each HA service. This has been ruled out mainly for the following reasons:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;It cannot be done outside of containers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It would create a lot of network traffic&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It would increase the management/monitoring of the pacemaker resources and clusters
exponentially&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Each service would still be limited to 16 nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A new container fencing agent would have to be written&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol class="upperalpha simple" start="2"&gt;
&lt;li&gt;&lt;p&gt;A single cluster where only the clone-max property is set for the non A/P services&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This would be still a single cluster, but unlike today where the cloned and
master/slave resources run on every controller we would introduce variables to
control the maximum number of nodes a resource could run on. E.g.
GaleraResourceCount would set clone-max to a value different than the number of
controllers. Example: 10 controllers, galera has clone-max set to 3, rabbit to
5 and redis to 3.
While this would be rather simple to implement and would change very little in the
current semantics, this design was ruled out:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We’d still have the 16 nodes limit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It would not provide fine grained control over which services live on which nodes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;No changes regarding security aspects compared to the existing status quo.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;No particular impact except added flexibility in placing pacemaker-managed resources.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The performance impact here is that with the added scalability it will be possible for
an operator to dedicate specific nodes for certain pacemaker-managed services.
There are no changes in terms of code, only a more flexible and scalable way to deploy
services on the control plane.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;This proposal aims to use the same method that the custom roles introduced in Newton
use to tailor the services running on a node. With the very same method it will be possible
to do that for the HA services managed by pacemaker today.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;No impact&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;michele&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;cmsj, abeekhof&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;We need to work on the following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Add location rule constraints support in puppet&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make puppet-tripleo set node properties on the nodes where a service profile&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create corresponding location rules&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add a puppet-tripleo pacemaker-remote profile&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;No additional dependencies are required.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We will need to test the flexible placement of the pacemaker-managed services
within the CI. This can be done within today’s CI limitations (i.e. in the three
controller HA job we can make sure that the placement is customized and working)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;No impact&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Mostly internal discussions within the HA team at Red Hat&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Sun, 25 Sep 2016 00:00:00 </pubDate></item><item><title>Step by step validation</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/step-by-step-validation.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/step-by-step-validation"&gt;https://blueprints.launchpad.net/tripleo/+spec/step-by-step-validation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Validate each step during the installation to be able to stop fast in
case of errors and provide feedback on which components are in error.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;During deployment, problems are often spotted at the end of the
configuration and can accumulate on top of each other making it
difficult to find the root cause.&lt;/p&gt;
&lt;p&gt;Deployers and developers will benefit by having the installation
process fails fast and spotting the lowest level possible components
causing the problem.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Leverage the steps already defined in Tripleo to run a validation tool
at the end of each step.&lt;/p&gt;
&lt;p&gt;During each step, collect assertions about what components are
configured on each host then at the end of the step, run a validation
tool consumming the assertions to report all the failed assertions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We could use Puppet to add assertions in the code to validate what has
been configured. The drawback of this approach is the difficulty to
have a good reporting on what are the issues compared to a specialized
tool that can be run outside of the installer if needed.&lt;/p&gt;
&lt;p&gt;The other drawback to this approach is that it can’t be reused in
future if/when we support non-puppet configuration and it probably
also can’t be used when we use puppet to generate an external config
file for containers.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;some validations may require access to sensitive data like passwords
or keys to access the components.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;This feature will be activated automatically in the installer.&lt;/p&gt;
&lt;p&gt;If needed, the deployer or developper will be able to launch the tool
by hand to validate a set of assertions.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;We expect the validations to take less than one minute by step.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;The objective is to have a fastest iterative process by failing fast.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Each configuration module will need to generate assertions to be
consummed by the validation tool.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Note that this approach (multiple step application of ansible in
localhost mode via heat) for upgrades and it will work well for
validations too.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/393448/"&gt;https://review.openstack.org/#/c/393448/&lt;/a&gt;&lt;/p&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignee: &amp;lt;&lt;a class="reference external" href="mailto:shardy%40redhat.com"&gt;shardy&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Other contributors to help validate services:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&amp;lt;launchpad-id or None&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;generate assertions about the configured components on the server
being configured in yaml files.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;implement the validation tool leveraging the work that has already
been done in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tripleo-validations&lt;/span&gt;&lt;/code&gt; that will do the following
steps:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;collect yaml files from the servers on the undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;run validations in parallel on each server from the undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;report all issues and exit with 0 if no error or 1 if there is at
least one error.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;To be added.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The change will be used automatically in the CI so it will always be tested.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We’ll need to document integration with whatever validation tool is
used, e.g so that those integrating new services (or in future
out-of-tree additional services) can know how to integrate with the
validation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;A similar approach was used in SpinalStack using serverspec. See
&lt;a class="reference external" href="https://github.com/redhat-cip/config-tools/blob/master/verify-servers.sh"&gt;https://github.com/redhat-cip/config-tools/blob/master/verify-servers.sh&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A collection of Ansible playbooks to detect and report potential
issues during TripleO deployments:
&lt;a class="reference external" href="https://github.com/openstack/tripleo-validations"&gt;https://github.com/openstack/tripleo-validations&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Prototype of composable upgrades with Heat+Ansible:
&lt;a class="reference external" href="https://review.openstack.org/#/c/393448/"&gt;https://review.openstack.org/#/c/393448/&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Mon, 19 Sep 2016 00:00:00 </pubDate></item><item><title>composable-undercloud</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/undercloud-heat.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/heat-undercloud"&gt;https://blueprints.launchpad.net/tripleo/+spec/heat-undercloud&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Deploy the undercloud with Heat instead of elements. This will allow us to use
composable services for the Undercloud and better fits with the architecture
of TripleO (providing a feedback loop between the Undercloud and Overcloud).
Furthermore this gets us a step closer to an HA undercloud and will help
us potentially convert the Undercloud to containers as work is ongoing
in t-h-t for containers as well.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The Undercloud today uses instack-undercloud. Instack undercloud is built
around the concept of ‘instack’ which uses elements to install service.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;When instack-undercloud started we shared elements across the undercloud
and overcloud via the tripleo-image-elements project. This is no longer the
case, thus we have lost the feedback loop of using the same elements in
both the overcloud and undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We retro-fitted instack-undercloud with a single element called
puppet-stack-config that contains a single (large) puppet manifest for
all the services. Being able to compose the Undercloud would be more
scalable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A maintenance problem. Ideally we could support the under and overcloud with the same tooling.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;We can use a single process Heat API/Engine in noauth mode to leverage
recent “composable services” work in the tripleo-heat-templates project.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A new heat-all launcher will be created.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We will run the heat-all launcher with “noauth” middleware to skip keystone
auth at a high level.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The heat-all process will use fake RPC driver and SQLite thus avoiding
the need to run RabbitMQ or MySQL on the deployment server for bootstrapping.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To satisfy client library requirements inside heat we will run a fake keystone
API (a thread in our installer perhaps), that will return just enough to
make these clients functionally work in noauth mode.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The new “deployed-server” feature in tripleo-heat-templates will make it
it possible to create Heat “server” objects and thus run
OS::Heat::SoftwareDeployment resources on pre-installed servers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We will use os-collect-config to communicate with the local Heat API via
the Heat signal transport. We will run os-collect-config until the
stack finished processing and either completes or fails.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create another tool which can read composable services in
tripleo-heat-templates. This tool would be required to have feature
parity with Heat such that things like parameters, nested stacks,
environments all worked in a similar fashion so that we could share the
template work across the Undercloud and Overcloud. This approach isn’t
really feasable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use an alternate tool like Ansible. This would creating duplicate services
in Ansible playbooks for each service we require in the Undercloud. This
approach isn’t ideal in that it involves duplicate work across the Undercloud
and Overcloud. Ongoing work around multi-node configuration and containers
would need to be duplicated into both the Overcloud (tripleo-heat-templates)
and Undercloud (Ansible) frameworks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The approach would run Heat on a single node in noauth mode. Heat
API and the fake Keystone stub would listen on 127.0.0.1 only. This
would be similar to other projects which allow noauth in local mode
as well.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We would again have a single template language driving our Undercloud
and Overcloud tooling. Heat templates are very well documented.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Initial testing shows the single process Heat API/Engine is quite light
taking only 70MB of RAM on a machine.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The approach is likely to be on-par with the performance of
instack-undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The format of undercloud.conf may change. We will add a
‘compat’ layer which takes the format of ‘undercloud.conf’ today
and sets Heat parameters and or includes heat environments to give
feature parity and an upgrade path for existing users. Additional,
CI jobs will also be created to ensure users who upgrade from
previous instack environments can use the new tool.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Developers would be able to do less work to maintain the UnderCloud by
sharing composable services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Future work around composable upgrades could also be utilized and shared
across the Undercloud and Overcloud.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;dprince (dan-prince on LP)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create heat-all launcher.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create python-tripleoclient command to run ‘undercloud deploy’.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create undercloud.yaml Heat templates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Heat all launcher and noauth middleware.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Swapping in the new Undercloud as part of CI should allow us to fully test it.&lt;/p&gt;
&lt;p&gt;Additionally, we will have an upgrade job that tests an upgrade from
an instack-undercloud installation to a new t-h-t driven Undercloud install.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation changes will need to be made that explains new config
interfaces (Heat parameters and environments). We could minimiz doc changes
by developing a ‘compat’ interface to process the legacy undercloud.conf
and perhaps even re-use the ‘undercloud install’ task in python-tripleoclient
as well so it essentially acts the same on the CLI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Onward dark owl presentation: &lt;a class="reference external" href="https://www.youtube.com/watch?v=y1qMDLAf26Q"&gt;https://www.youtube.com/watch?v=y1qMDLAf26Q&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-composable-undercloud"&gt;https://etherpad.openstack.org/p/tripleo-composable-undercloud&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/heat-undercloud"&gt;https://blueprints.launchpad.net/tripleo/+spec/heat-undercloud&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Sun, 07 Aug 2016 00:00:00 </pubDate></item><item><title>Enable deployment of centralized logging</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/tripleo-opstools-centralized-logging.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-opstools-centralized-logging"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-opstools-centralized-logging&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;TripleO should be deploying with an out-of-the-box centralized logging
solution to serve the overcloud.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;With a complex distributed system like OpenStack, identifying and
diagnosing a problem may require tracking a transaction across many
different systems and many different logfiles.  In the absence of a
centralized logging solution, this process is frustrating to both new
and experienced operators and can make even simple problems hard to
diagnose.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;We will deploy the &lt;a class="reference external" href="http://www.fluentd.org/"&gt;Fluentd&lt;/a&gt; service in log collecting mode as a
composable service on all nodes in the overcloud stack when configured
to do so by the environment.  Each composable service will have its
own fluentd source configuration.&lt;/p&gt;
&lt;p&gt;To receive these messages, we will deploy a centralized logging system
running &lt;a class="reference external" href="https://www.elastic.co/products/kibana"&gt;Kibana&lt;/a&gt;, &lt;a class="reference external" href="https://www.elastic.co/"&gt;Elasticsearch&lt;/a&gt; and Fluentd on dedicated nodes to
provide log aggregation and analysis.  This will be deployed in a
dedicated Heat stack that is separate from the overcloud stack using
composable roles.&lt;/p&gt;
&lt;p&gt;We will also support sending messages to an external Fluentd
instance not deployed by tripleo.&lt;/p&gt;
&lt;section id="summary-of-use-cases"&gt;
&lt;h3&gt;Summary of use cases&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Elasticsearch, Kibana and Fluentd log relay/transformer deployed as
a separate Heat stack in the overcloud stack; Fluentd log
collector deployed on each overcloud node&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ElasticSearch, Kibana and Fluentd log relay/transformer deployed in
external infrastructure; Fluentd log collector deployed on each
overcloud node&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Data collected from the logs of OpenStack services can contain
sensitive information:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Communication between the
fluentd agent and the log aggregator should be protected with SSL.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Access to the Kibana UI must have at least basic HTTP
authentication, and client access should be via SSL.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ElasticSearch should only allow collections over &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;localhost&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Additional resources will be required for running Fluentd on overcloud
nodes.  Log traffic from the overcloud nodes to the log aggregator
will consume some bandwidth.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Fluentd will be deployed on all overcloud nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New parameters for configuring Fluentd collector.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New parameters for configuring log collector (Fluentd,
ElasticSearch, and Kibana)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Support for the new node type should be implemented for tripleo-quickstart.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Martin Mágr &amp;lt;&lt;a class="reference external" href="mailto:mmagr%40redhat.com"&gt;mmagr&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;
Lars Kellogg-Stedman &amp;lt;&lt;a class="reference external" href="mailto:lars%40redhat.com"&gt;lars&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;puppet-tripleo profile for fluentd service&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates composable role for FluentD collector deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates composable role for FluentD aggregator deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates composable role for ElasticSearch deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates composable role for Kibana deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for logging node in tripleo-quickstart&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Puppet module for Fluentd: &lt;cite&gt;konstantin-fluentd&lt;/cite&gt; [1]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Puppet module for ElasticSearch &lt;cite&gt;elasticsearch-elasticsearch&lt;/cite&gt; [2]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Puppet module for Kibana (tbd)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CentOS Opstools SIG package repository&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Fluentd client deployment will be tested by current TripleO CI as soon as
the patch is merged. Because the centralized logging features will not
be enabled by default we may need to introduce specific tests for
these features.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Process of creating new node type and new options will have to be documented.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://forge.puppet.com/srf/fluentd"&gt;https://forge.puppet.com/srf/fluentd&lt;/a&gt;
[2] &lt;a class="reference external" href="https://forge.puppet.com/elasticsearch/elasticsearch"&gt;https://forge.puppet.com/elasticsearch/elasticsearch&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Wed, 03 Aug 2016 00:00:00 </pubDate></item><item><title>TripleO Undercloud NTP Server</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/ocata/undercloud-ntp-server.html</link><description>
 
&lt;p&gt;The Undercloud should provide NTP services for when external NTP services are
not available.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;NTP services are required to deploy with HA, but we rely on external services.
This means that TripleO can’t be installed without Internet access or a local
NTP server.&lt;/p&gt;
&lt;p&gt;This has several drawbacks:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The NTP server is a potential point of failure, and it is an external
dependency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Isolated deployments without Internet access are not possible without
additional effort (manually deploying an NTP server).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Infra CI is dependent on an external resource, leading to potential
false negative test runs or CI failures.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;In order to address this problem, the Undercloud installation process should
include setting up an NTP server on the local Undercloud. The use of this
NTP server would be optional, but we may wish to make it a default. Having
a default is better than none, since HA deployments will fail without time
synchronization between the controller cluster members.&lt;/p&gt;
&lt;p&gt;The operation of the NTP server on the Undercloud would be primarily of use
in small or proof-of-concept deployments. It is expected that sufficiently
large deployments will have an infrastructure NTP server already operating
locally.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The alternative is to continue to require external NTP services, or to
require manual steps to set up a local NTP server.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Since the NTP server is required for syncing the HA, a skewed clock on one
controller (in relation to the other controllers) may make it ineligable to
participate in the HA cluster. If more than one controller’s clock is skewed,
the entire cluster will fail to operate. This opens up an opportunity for
denial-of-service attacks against the cloud, either by causing NTP updates
to fail, or using a man-in-the-middle attack where deliberately false NTP
responses are returned to the controllers.&lt;/p&gt;
&lt;p&gt;Of course, operating the NTP server on the Undercloud moves that attack
vector down to the Undercloud, so sufficient security hardening should be done
on the Undercloud and/or the attached networks. We may wish to bind the NTP
server only to the provisioning (control plane) network.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;This may make the life of the installer easier, since they don’t need to open
a network connection to an NTP server or set up a local NTP server.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The operation of the NTP server should have a negligible impact on Undercloud
performance. It is a lightweight protocol and the daemon requires little
resources.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;We now require that a valid NTP server be configured either in the templates
or on the deployment command-line. This requirement would be optional if we had
a default pointing to NTP services on the Undercloud.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="mailto:dsneddon%40redhat.com"&gt;dsneddon&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="mailto:bfournie%40redhat.com"&gt;bfournie&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;The TripleO Undercloud installation scripts will have to be modified to include
the installation and configuration of an NTP server. This will likely be done
using a composable service for the Undercloud, with configuration data taken
from undercloud.conf. The configuration should include a set of default NTP
servers which are reachable on the public Internet for when no servers are
specified in undercloud.conf.&lt;/p&gt;
&lt;p&gt;Implement opening up iptables for NTP on the control plane network (bound to
only one IP/interface [ctlplane]  if possible).&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;The NTP server RPMs must be installed, and upstream NTP servers must be
identified (although we might configure a default such as pool.ntp.org)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Since proper operation of the NTP services are required for successful
deployment of an HA overcloud, this functionality will be tested every time
a TripleO CI HA job is run.&lt;/p&gt;
&lt;p&gt;We may also want to implement a validation that ensures that the NTP server
can reach its upstream stratum 1 servers. This will ensure that the NTP
server is serving up the correct time. This is optional, however, since the
only dependency is that the overcloud nodes agree on the time, not that it
be correct.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The setup and configuration of the NTP server should be documented. Basic NTP
best practices should be communicated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;[1] - Administration Guide Draft/NTP - Fedora Project
&lt;a class="reference external" href="https://fedoraproject.org/wiki/Administration_Guide_Draft/NTP"&gt;https://fedoraproject.org/wiki/Administration_Guide_Draft/NTP&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Wed, 03 Aug 2016 00:00:00 </pubDate></item><item><title>Undercloud Upgrade</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/undercloud-upgrade.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/undercloud-upgrade"&gt;https://blueprints.launchpad.net/tripleo/+spec/undercloud-upgrade&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Our currently documented upgrade path for the undercloud is very problematic.
In fact, it doesn’t work.  A number of different patches are attempting to
address this problem (see the &lt;a class="reference internal" href="#references"&gt;References&lt;/a&gt; section), but they all take slightly
different approaches that are not necessarily compatible with each other.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The undercloud upgrade must be carefully orchestrated.  A few of the problems
that can be encountered during an undercloud upgrade if things are not done
or not done in the proper order:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Services may fail and get stuck in a restart loop&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Service databases may not be properly upgraded&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Services may fail to stop and prevent the upgraded version from starting&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Currently there is not agreement over who should be responsible for running
the various steps of the undercloud upgrade.  Getting everyone on the same
page regarding this is the ultimate goal of this spec.&lt;/p&gt;
&lt;p&gt;Also of note is the MariaDB major version update flow from
&lt;a class="reference internal" href="#upgrade-documentation-under-and-overcloud"&gt;Upgrade documentation (under and overcloud)&lt;/a&gt;.  This will need to be
addressed as part of whatever upgrade solution we decide to pursue.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;I’m going to present my proposed solution here, but will try to give a fair
overview of the other proposals in the &lt;a class="reference internal" href="#alternatives"&gt;Alternatives&lt;/a&gt; section.  Others
should feel free to push modifications or follow-ups if I miss anything
important, however.&lt;/p&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;Services must be stopped before their respective package update is run.
This is because the RPM specs for the services include a mandatory restart to
ensure that the new code is running after the package is updated.  On a major
version upgrade, this can and does result in broken services because the config
files are not always forward compatible, so until Puppet is run again to
configure them appropriately the service cannot start.  The broken services
can cause other problems as well, such as the yum update taking an excessively
long time because it times out waiting for the service to restart.  It’s worth
noting that this problem does not exist on an HA overcloud because Pacemaker
stubs out the service restarts in the systemd services so the package update
restart becomes a noop.&lt;/p&gt;
&lt;p&gt;Because the undercloud is not required to have extremely high uptime, I am in
favor of just stopping all of the services, updating all the packages, then
re-running the undercloud install to apply the new configs and start the
services again.  This ensures that the services are not restarted by the
package update - which only happens if the service was running at the time of
the update - and that there is no chance of an old version of a service being
left running and interfering with the new version, as can happen when moving
a service from a standalone API process to httpd.&lt;/p&gt;
&lt;p&gt;instack-undercloud will be responsible for implementing the process described
above.  However, to avoid complications with instack-undercloud trying to
update itself, tripleoclient will be responsible for updating
instack-undercloud and its dependencies first.  This two-step approach
should allow us to sanely use an older tripleoclient to run the upgrade
because the code in the client will be minimal and should not change from
release to release.  Upgrade-related backports to stable clients should not
be needed in any foreseeable case.  Any potential version-specific logic can
live in instack-undercloud.  The one exception being that we may need to
initially backport this new process to the previous stable branch so we can
start using it without waiting an entire cycle.  Since the current upgrade
process does not work correctly there, I think this would be a valid bug fix
backport.&lt;/p&gt;
&lt;p&gt;A potential drawback of this approach is that it will not automatically
trigger the Puppet service db-syncs because Puppet is not aware that the
version has changed if we update the packages separately.  However, I feel
this is a case we need to handle sanely anyway in case a package is updated
outside Puppet either intentionally or accidentally.  To that end, we’ve
already merged a patch to always run db-syncs on the undercloud since they’re
idempotent anyway.  See &lt;a class="reference internal" href="#stop-all-services-before-upgrading"&gt;Stop all services before upgrading&lt;/a&gt; for a link to
the patch.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="mariadb"&gt;
&lt;h3&gt;MariaDB&lt;/h3&gt;
&lt;p&gt;Regarding the MariaDB issue mentioned above, I believe that regardless of the
approach we take, we should automate the dump and restore of the database as
much as possible.  Either solution should be able to look at the version of
mariadb before yum update and the version after, and decide whether the db
needs to be dumped.  If a user updates the package manually outside the
undercloud upgrade flow then they will be responsible for the db upgrade
themselves.  I think this is the best we can do, short of writing some sort
of heuristic that can figure out whether the existing db files are for an
older version of MariaDB and doing the dump/restore based on that.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="updates-vs-upgrades"&gt;
&lt;h3&gt;Updates vs. Upgrades&lt;/h3&gt;
&lt;p&gt;I am also proposing that we not differentiate between minor updates and major
upgrades on the undercloud.  Because we don’t need to be as concerned with
uptime there, any additional time required to treat all upgrades as a
potential major version upgrade should be negligible, and it avoids us
having to maintain and test multiple paths.&lt;/p&gt;
&lt;p&gt;Additionally, the difference between a major and minor upgrade becomes very
fuzzy for anyone upgrading between versions of master.  There may be db
or rpc changes that require the major upgrade flow anyway.  Also, the whole
argument assumes we can even come up with a sane, yet less-invasive update
strategy for the undercloud anyway, and I think our time is better spent
elsewhere.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;As shown in &lt;a class="reference internal" href="#don-t-update-whole-system-on-undercloud-upgrade"&gt;Don’t update whole system on undercloud upgrade&lt;/a&gt;, another
option is to limit the manual yum update to just instack-undercloud and make
Puppet responsible for updating everything else.  This would allow Puppet
to handle all of the upgrade logic internally.  As of this writing, there is
at least one significant problem with the patch as proposed because it does
not update the Puppet modules installed on the undercloud, which leaves us
in a chicken and egg situation with a newer instack-undercloud calling older
Puppet modules to run the update.  I believe this could be solved by also
updating the Puppet modules along with instack-undercloud.&lt;/p&gt;
&lt;p&gt;Drawbacks of this approach would be that each service needs to be orchestrated
correctly in Puppet (this could also be a feature, from a Puppet CI
perspective), and it does not automatically handle things like services moving
from standalone to httpd.  This could be mitigated by the undercloud upgrade
CI job catching most such problems before they merge.&lt;/p&gt;
&lt;p&gt;I still personally feel this is more complicated than the proposal above, but
I believe it could work, and as noted could have benefits for CI’ing upgrades
in Puppet modules.&lt;/p&gt;
&lt;p&gt;There is one other concern with this that is less a functional issue, which is
that it significantly alters our previous upgrade methods, and might be
problematic to backport as older versions of instack-undercloud were assuming
an external package update.  It’s probably not an insurmountable obstacle, but
I do feel it’s worth noting.  Either approach is going to require some amount
of backporting, but this may require backporting in non-tripleo Puppet modules
which may be more difficult to do.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;No significant security impact one way or another.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;This will likely have an impact on how a user runs undercloud upgrades,
especially compared to our existing documented upgrade method.
Ideally all of the implementation will happen behind the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;undercloud&lt;/span&gt;
&lt;span class="pre"&gt;upgrade&lt;/span&gt;&lt;/code&gt; command regardless of which approach is taken, but even that is a
change from before.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The method I am suggesting can do an undercloud upgrade in 20-25
minutes end-to-end in a scripted CI job.&lt;/p&gt;
&lt;p&gt;The performance impact of the Puppet approach is unknown to me.&lt;/p&gt;
&lt;p&gt;The performance of the existing method where service packages are updated with
the service still running is terrible - upwards of two hours for a full
upgrade in some cases, assuming the upgrade completes at all.  This is largely
due to the aforementioned problem with services restarting before their config
files have been updated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Same as the end user impact.  In this case I believe they’re the same person.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Discussed somewhat in the proposals, but I believe my approach is a little
simpler from the developer perspective.  They don’t have to worry about the
orchestration of the upgrade, they only have to provide a valid configuration
for a given version of OpenStack.  The one drawback is that if we add any new
services on the undercloud, their db-sync must be wired into the “always run
db-syncs” list.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;bnemec&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EmilienM&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other contributors (I’m essentially listing everyone who has been involved in
upgrade work so far):&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;lbezdick&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;bandini&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;marios&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;jistr&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Implement an undercloud upgrade CI job to test upgrades.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement the selected approach in the undercloud upgrade command.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;A CI job is already underway.  See &lt;a class="reference internal" href="#undercloud-upgrade-ci-job"&gt;Undercloud Upgrade CI Job&lt;/a&gt;.  This should
provide reasonable coverage on a per-patch basis.  We may also want to test
undercloud upgrades in periodic jobs to ensure that it is possible to deploy
an overcloud with an upgraded undercloud.  This probably takes too long to be
done in the regular CI jobs, however.&lt;/p&gt;
&lt;p&gt;There has also been discussion of running Tempest API tests on the upgraded
undercloud, but I’m unsure of the status of that work.  It would be good to
have in the standalone undercloud upgrade job though.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The docs will need to be updated to reflect the new upgrade method.  Hopefully
this will be as simple as “Run openstack undercloud upgrade”, but that remains
to be seen.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;section id="stop-all-services-before-upgrading"&gt;
&lt;h3&gt;Stop all services before upgrading&lt;/h3&gt;
&lt;p&gt;Code: &lt;a class="reference external" href="https://review.openstack.org/331804"&gt;https://review.openstack.org/331804&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Docs: &lt;a class="reference external" href="https://review.openstack.org/315683"&gt;https://review.openstack.org/315683&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Always db-sync: &lt;a class="reference external" href="https://review.openstack.org/#/c/346138/"&gt;https://review.openstack.org/#/c/346138/&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="don-t-update-whole-system-on-undercloud-upgrade"&gt;
&lt;h3&gt;Don’t update whole system on undercloud upgrade&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/327176"&gt;https://review.openstack.org/327176&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="upgrade-documentation-under-and-overcloud"&gt;
&lt;h3&gt;Upgrade documentation (under and overcloud)&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/308985"&gt;https://review.openstack.org/308985&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="undercloud-upgrade-ci-job"&gt;
&lt;h3&gt;Undercloud Upgrade CI Job&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/346995"&gt;https://review.openstack.org/346995&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</description><pubDate>Tue, 02 Aug 2016 00:00:00 </pubDate></item><item><title>Enable deployment of availability monitoring</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/tripleo-opstools-availability-monitoring.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-opstools-availability-monitoring"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-opstools-availability-monitoring&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;TripleO should be deploying out-of-the-box availability monitoring solution
to serve the overcloud.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently there is no such feature implemented except for possibility to deploy
sensu-server, sensu-api and uchiwa (Sensu dashboard) services in the undercloud
stack. Without sensu-client services deployed on overcloud nodes this piece
of code is useless. Due to potential of high resource consumption it is also
reasonable to remove current undercloud code to avoid possible problems
when high number of overcloud nodes is being deployed.&lt;/p&gt;
&lt;p&gt;Instead sensu-server, sensu-api and uchiwa should be deployed on the separate
node(s) whether it is on the undercloud level or on the overcloud level.
And so sensu-client deployment support should be flexible enough to enable
connection to external monitoring infrastructure or with Sensu stack deployed
on the dedicated overcloud node.&lt;/p&gt;
&lt;p&gt;Summary of use cases:&lt;/p&gt;
&lt;p&gt;1. sensu-server, sensu-api and uchiwa deployed in external infrastructure;
sensu-client deployed on each overcloud node
2. sensu-server, sensu-api and uchiwa deployed as a separate Heat stack in
the overcloud stack; sensu-client deployed on each overcloud node&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The sensu-client service will be deployed as a composable service on
the overcloud stack when it is explicitly stated via environment file.
Sensu checks will have to be configured as subscription checks (see [0]
for details). Each composable service will have it’s own subscription string,
which will ensure that checks defined on Sensu server node (wherever it lives)
are run on the correct overcloud nodes.&lt;/p&gt;
&lt;p&gt;There will be implemented a possibility to deploy sensu-server, sensu-api
and uchiwa services on a stand alone node deployed by the undercloud.
This standalone node will have a dedicated purpose for monitoring
(not only for availability monitoring services, but in future also for
centralized logging services or performance monitoring services)&lt;/p&gt;
&lt;p&gt;The monitoring node will be deployed as a separate Heat stack to the overcloud
stack using Puppet and composable roles for required services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Additional service (sensu-client) will be installed on all overcloud nodes.
These services will have open connection to RabbitMQ instance running
on monitoring node and are used to execute commands (checks) on the overcloud
nodes. Check definition will live on the monitoring node.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;We might consider deploying separate RabbitMQ and Redis for monitoring purposes
if we want to avoid influencing OpenStack deployment in the overcloud.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Sensu clients will be deployed by default on all overcloud nodes except the monitoring node.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New Sensu common parameters:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul&gt;
&lt;li&gt;&lt;p&gt;MonitoringRabbitHost&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;RabbitMQ host Sensu has to connect to&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MonitoringRabbitPort&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;RabbitMQ port Sensu has to connect to&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MonitoringRabbitUseSSL&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Whether Sensu should connect to RabbitMQ using SSL&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MonitoringRabbitPassword&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;RabbitMQ password used for Sensu to connect&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MonitoringRabbitUserName&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;RabbitMQ username used for Sensu to connect&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MonitoringRabbitVhost&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;RabbitMQ vhost used for monitoring purposes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New Sensu server/API parameters&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul&gt;
&lt;li&gt;&lt;p&gt;MonitoringRedisHost&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Redis host Sensu has to connect to&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MonitoringRedisPassword&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Redis password used for Sensu to connect&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MonitoringChecks:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Full definition (for all subscriptions) of checks performed by Sensu&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New parameters for subscription strings for each composable service:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;For example for service nova-compute MonitoringSubscriptionNovaCompute, which will default to ‘overcloud-nova-compute’&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Support for new node type should be implemented for tripleo-quickstart.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Martin Mágr &amp;lt;&lt;a class="reference external" href="mailto:mmagr%40redhat.com"&gt;mmagr&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;puppet-tripleo profile for Sensu services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;puppet-tripleo profile for Uchiwa service&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates composable service for sensu-client deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates composable service for sensu-server deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates composable service for sensu-api deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-heat-templates composable service for uchiwa deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for monitoring node in tripleo-quickstart&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Revert patch(es) implementing Sensu support in instack-undercloud&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Puppet module for Sensu services: sensu-puppet [1]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Puppet module for Uchiwa: puppet-uchiwa [2]&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CentOS Opstools SIG repo [3]&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Sensu client deployment will be tested by current TripleO CI as soon as
the patch is merged, as it will be deployed by default.&lt;/p&gt;
&lt;p&gt;We should consider creating CI job for deploying overcloud with monitoring
node to test the rest of the monitoring components.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Process of creating new node type and new options will have to be documented.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[0] &lt;a class="reference external" href="https://sensuapp.org/docs/latest/reference/checks.html#subscription-checks"&gt;https://sensuapp.org/docs/latest/reference/checks.html#subscription-checks&lt;/a&gt;
[1] &lt;a class="reference external" href="https://github.com/sensu/sensu-puppet"&gt;https://github.com/sensu/sensu-puppet&lt;/a&gt;
[2] &lt;a class="reference external" href="https://github.com/Yelp/puppet-uchiwa"&gt;https://github.com/Yelp/puppet-uchiwa&lt;/a&gt;
[3] &lt;a class="reference external" href="https://wiki.centos.org/SpecialInterestGroup/OpsTools"&gt;https://wiki.centos.org/SpecialInterestGroup/OpsTools&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 26 Jul 2016 00:00:00 </pubDate></item><item><title>TripleO LLDP Validation</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/tripleo-lldp-validation.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-lldp-validation"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-lldp-validation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Link Layer Discovery Protocol (LLDP) is a vendor-neutral link layer
protocol in the Internet Protocol Suite used by network devices for
advertising their identity, capabilities, and neighbors on an
IEEE 802 local area network, principally wired Ethernet. [1]&lt;/p&gt;
&lt;p&gt;The Link Layer Discover Protocol (LLDP) helps identify layer 1/2
connections between hosts and switches. The switch port, chassis ID,
VLANs trunked, and other info is available, for planning or
troubleshooting a deployment. For instance, a deployer may validate
that the proper VLANs are supplied on a link, or that all hosts
are connected to the Provisioning network.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;A detailed description of the problem:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Deployment networking is one of the most difficult parts of any
OpenStack deployment. A single misconfigured port or loose cable
can derail an entire multi-rack deployment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Given the first point, we should work to automate validation and
troubleshooting where possible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Work is underway to collect LLDP data in ironic-python-agent,
and we have an opportunity to make that data useful [2].&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The goal is to expose LLDP data that is collected during
introspection, and provide this data in a format that is useful for the
deployer. This work depends on the LLDP collection work being done
in ironic-python-agent [3].&lt;/p&gt;
&lt;p&gt;There is work being done to implement LLDP data collection for Ironic/
Neutron integration. Although this work is primarily focused on features
for bare-metal Ironic instances, there will be some overlap with the
way TripleO uses Ironic to provision overcloud servers.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;There are many network management utilities that use CDP or LLDP data to
validate the physical networking. Some of these are open source, but none
are integrated with OpenStack.&lt;/p&gt;
&lt;p&gt;Alternative approaches that do not use LLDP are typically vendor-specific
and require specific hardware support. Cumulus has a solution which works
with multiple vendors’ hardware, but that solution requires running their
custom OS on the Ethernet switches.&lt;/p&gt;
&lt;p&gt;Another approach which is common is to perform collection of the switch
configurations to a central location, where port configurations can be
viewed, or in some cases even altered and remotely pushed. The problem
with this approach is that the switch configurations are hardware and
vendor-specific, and typically a network engineer is required to read
and interpret the configuration. A unified approach that works for all
common switch vendors is preferred, along with a unified reporting format.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The physical network report provides a roadmap to the underlying network
structure. This could prove handy to an attacker who was unaware of the
existing topology. On the other hand, the information about physical
network topology is less valuable than information about logical topology
to an attacker. LLDP contains some information about both physical and
logical topology, but the logical topology is limited to VLAN IDs.&lt;/p&gt;
&lt;p&gt;The network topology report should be considered sensitive but not
critical. No credentials or shared secrets are revealed in the data
collected by ironic-inspector.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;This report will hopefully reduce the troubleshooting time for nodes
with failed network deployments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;If this report is produced as part of the ironic-inspector workflow,
then it will increase the time taken to introspect each node by a
negligible amount, perhaps a few seconds.&lt;/p&gt;
&lt;p&gt;If this report is called by the operator on demand, it will have
no performance impact on other components.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Deployers may want additional information than the per-node LLDP report.
There may be some use in providing aggregate reports, such as the number
of nodes with a specific configuration of interfaces and trunked VLANs.
This would help to highlight outliers or misconfigured nodes.&lt;/p&gt;
&lt;p&gt;There have been discussions about adding automated switch configuration
in TripleO. This would be a mechanism whereby deployers could produce the
Ethernet switch configuration with a script based on a configuration
template. The deployer would provide specifics like the number of nodes
and the configuration per node, and the script would generate the switch
configuration to match. In that case, the LLDP collection and analysis
would function as a validator for the automatically generated switch
port configurations.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;The initial work will be to fill in fixed fields such as Chassis ID
and switch port. An LLDP packet can contain additional data on a
per-vendor basis, however.&lt;/p&gt;
&lt;p&gt;The long-term plan is to store the entire LLDP packet in the
metadata. This will have to be parsed out. We may have to work with
switch vendors to figure out how to interpret some of the data if
we want to make full use of it.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Some notes about implementation:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;This Python tool will access the introspection data and produce
reports on various information such as VLANs per port, host-to-port
mapping, and MACs per host.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The introspection data can be retrieved with the Ironic API [4] [5].&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The data will initially be a set of fixed fields which are retrievable
in the JSON in the Ironic introspection data. Later, the entire
LLDP packet will be stored, and will need to be parsed outisde of the
Ironic API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Although the initial implementation can return a human-readable report,
other outputs should be available for automation, such as YAML.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The tool that produces the LLDP report should be able to return data
on a single host, or return all of the data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some basic support for searching would be a nice feature to have.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This data will eventually be used by the GUI to display as a validation
step in the deployment workflow.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;dsneddon &amp;lt;&lt;a class="reference external" href="mailto:dsneddon%40redhat.com"&gt;dsneddon&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bfournie &amp;lt;&lt;a class="reference external" href="mailto:bfournie%40redhat.com"&gt;bfournie&lt;span&gt;@&lt;/span&gt;redhat&lt;span&gt;.&lt;/span&gt;com&lt;/a&gt;&amp;gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create the Python script to grab introspection data from Swift using
the API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create the Python code to extract the relevant LLDP data from the
data JSON.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement per-node reports&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement aggregate reports&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interface with UI developers to give them the data in a form that can
be consumed and presented by the TripleO UI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the future, when the entire LLDP packet is stored, refactor logic
to take this into account.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Since this is a report that is supposed to benefit the operator, perhaps
the best way to include it in CI is to make sure that the report gets
logged by the Undercloud. Then the report can be reviewed in the log
output from the CI run.&lt;/p&gt;
&lt;p&gt;In fact, this might benefit the TripleO CI process, since hardware issues
on the network would be easier to troubleshoot without having access to
the bare metal console.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation will need to be written to cover making use of the new
LLDP reporting tool. This should cover running the tool by hand and
interpreting the data.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;[1] - Wikipedia entry on LLDP:
&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Link_Layer_Discovery_Protocol"&gt;https://en.wikipedia.org/wiki/Link_Layer_Discovery_Protocol&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[2] - Blueprint for Ironic/Neutron integration:
&lt;a class="reference external" href="https://blueprints.launchpad.net/ironic/+spec/ironic-ml2-integration"&gt;https://blueprints.launchpad.net/ironic/+spec/ironic-ml2-integration&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[3] - Review: Support LLDP data as part of interfaces in inventory
&lt;a class="reference external" href="https://review.openstack.org/#/c/320584/"&gt;https://review.openstack.org/#/c/320584/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[4] - Accessing Ironic Introspection Data
&lt;a class="reference external" href="http://tripleo.org/advanced_deployment/introspection_data.html"&gt;http://tripleo.org/advanced_deployment/introspection_data.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[5] - Ironic API - Get Introspection Data
&lt;a class="reference external" href="http://docs.openstack.org/developer/ironic-inspector/http-api.html#get-introspection-data"&gt;http://docs.openstack.org/developer/ironic-inspector/http-api.html#get-introspection-data&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Mon, 13 Jun 2016 00:00:00 </pubDate></item><item><title>Metal to Tenant: Ironic in Overcloud</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/metal-to-tenant.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/ironic-integration"&gt;https://blueprints.launchpad.net/tripleo/+spec/ironic-integration&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This blueprint adds support for providing bare metal machines to tenants by
integrating Ironic to the overcloud.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;There is an increasing interest in providing bare metal machines to tenants in
the overcloud in addition to or instead of virtual instances. One example is
Sahara: users hope to achieve better performance by removing the hypervisor
abstraction layer in order to eliminate the noisy neighbor effect. For that
purpose, the OpenStack Bare metal service (Ironic) provides an API and a Nova
driver to serve bare metal instances behind the same Nova and Neutron API’s.
Currently however TripleO does not support installing and configuring Ironic
and Nova to serve bare metal instances to the tenant.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="composable-services"&gt;
&lt;h3&gt;Composable Services&lt;/h3&gt;
&lt;p&gt;In the bare metal deployment case, the nova-compute service is only a thin
abstraction layer around the Ironic API. The actual compute instances in
this case are the bare metal nodes. Thus a TripleO deployment with support for
only bare metal nodes will not need dedicated compute nodes in the overcloud.
The overcloud nova-compute service will therefore be placed on controller nodes.&lt;/p&gt;
&lt;p&gt;New TripleO composable services will be created and optionally deployed on the
controller nodes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Services::IronicApi&lt;/span&gt;&lt;/code&gt; will deploy the bare metal API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Services::IronicNovaCompute&lt;/span&gt;&lt;/code&gt; will deploy nova compute
with Ironic as a back end. It will also configure the nova compute to use
&lt;a class="reference external" href="https://github.com/openstack/ironic/blob/master/ironic/nova/compute/manager.py"&gt;ClusteredComputeManager&lt;/a&gt;
provide by Ironic to work around inability to have several nova compute
instances configured with Ironic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Services::IronicConductor&lt;/span&gt;&lt;/code&gt; will deploy a TFTP server,
an HTTP server (for an optional iPXE environment) and an ironic-conductor
instance. The ironic-conductor instance will not be managed by pacemaker
in the HA scenario, as  Ironic has its own Active/Active HA model,
which spreads load on all active conductors using a hash ring.&lt;/p&gt;
&lt;p&gt;There is no public data on how many bare metal nodes each conductor
can handle, but the Ironic team expects an order of hundreds of nodes
per conductor.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since this feature is not a requirement in all deployments, this will be
opt-in by having a separate environment file.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="hybrid-deployments"&gt;
&lt;h3&gt;Hybrid Deployments&lt;/h3&gt;
&lt;p&gt;For hybrid deployments with both virtual and bare metal instances, we will use
Nova host aggregates: one for all bare metal hosts, the other for all virtual
compute nodes. This will prevent virtual instances being deployed on baremetal
nodes. Note that every bare metal machine is presented as a separate
Nova compute host. These host aggregates will always be created, even for
purely bare metal deployments, as users might want to add virtual computes
later.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="networking"&gt;
&lt;h3&gt;Networking&lt;/h3&gt;
&lt;p&gt;As of Mitaka, Ironic only supports flat networking for all tenants and for
provisioning. The &lt;strong&gt;recommended&lt;/strong&gt; deployment layout will consist of two networks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;provisioning&lt;/span&gt;&lt;/code&gt; / &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tenant&lt;/span&gt;&lt;/code&gt; network. It must have access to the
overcloud Neutron service for DHCP, and to overcloud baremetal-conductors
for provisioning.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;While this network can technically be the same as the undercloud
provisioning network, it’s not recommended to do so due to
potential conflicts between various DHCP servers provided by
Neutron (and in the future by ironic-inspector).&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;management&lt;/span&gt;&lt;/code&gt; network. It will contain the BMCs of bare metal nodes,
and it only needs access to baremetal-conductors. No tenant access will be
provided to this network.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;Splitting away this network is not really required if tenants are
trusted (which is assumed in this spec) and BMC access is
reasonably restricted.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="limitations"&gt;
&lt;h3&gt;Limitations&lt;/h3&gt;
&lt;p&gt;To limit the scope of this spec the following definitely useful features are
explicitly left out for now:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;provision&lt;/span&gt;&lt;/code&gt; &amp;lt;-&amp;gt; &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;tenant&lt;/span&gt;&lt;/code&gt; network separation (not yet implemented by
ironic)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;in-band inspection (requires ironic-inspector, which is not yet HA-ready)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;untrusted tenants (requires configuring secure boot and checking firmwares,
which is vendor-dependent)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;node autodiscovery (depends on ironic-inspector)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Alternatively, we could leave configuring a metal-to-tenant environment up to
the operator.&lt;/p&gt;
&lt;p&gt;We could also have it enabled by default, but most likely it won’t be required
in most deployments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Most of the security implications have to be handled within Ironic. Eg. wiping
the hard disk, checking firmwares, etc. Ironic needs to be configured to be
able to run these jobs by enabling automatic cleaning during node lifecycle.
It is also worth mentioning that we will assume trusted tenants for these bare
metal machines.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The ability to deploy Ironic in the overcloud will be optional.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;If enabled, TripleO will deploy additional services to the overcloud:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ironic-conductor&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;a TFTP server&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;an HTTP server&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;None of these should have heavy performance requirements.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;ifarkas&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;dtantsur, lucasagomes, mgould, mkovacik&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;when the environment file is included, make sure:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ironic is deployed on baremetal-conductor nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;nova compute is deployed and correctly configured, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;configuring Ironic as a virt driver&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;configuring ClusteredComputeManager&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;setting ram_allocation_ratio to 1.0&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;host aggregates are created&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;update documentation&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;This is testable in the CI with nested virtualization and tests will be added
to the tripleo-ci jobs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Quick start documentation and a sample environment file will be provided.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document how to enroll new nodes in overcloud ironic (including host
aggregates)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/4/html/Configuration_Reference_Guide/host-aggregates.html"&gt;Host aggregates&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Wed, 25 May 2016 00:00:00 </pubDate></item><item><title>Adding OVS-DPDK to Tripleo</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/tripleo-ovs-dpdk.html</link><description>&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0 Unported
License.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;section id="adding-ovs-dpdk-to-tripleo"&gt;
 
&lt;p&gt;Blueprint URL -
&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-ovs-dpdk"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-ovs-dpdk&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;DPDK is a set of libraries and drivers for fast packet processing and gets as
close to wire-line speed as possible for virtual machines.&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;It is a complete framework for fast packet processing in data plane
applications.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directly polls the data from the NIC.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Does not use interrupts - to prevent performance overheads.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses the hugepages to preallocate large regions of memory, which allows the
applications to DMA data directly into these pages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DPDK also has its own buffer and ring management systems for handling
sk_buffs efficiently.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;DPDK provides data plane libraries and NIC drivers for -&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Queue management with lockless queues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Buffer manager with pre-allocated fixed size buffers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PMD (poll mode drivers) to work without asynchronous notifications.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Packet framework (set of libraries) to help data plane packet processing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Memory manager - allocates pools of memory, uses a ring to store free
objects.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Today the installation and configuration of OVS+DPDK in openstack is done
manually after overcloud deployment. This can be very challenging for the
operator and tedious to do over a large number of compute nodes.
The installation of OVS+DPDK needs be automated in tripleo.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Identification of the hardware capabilities for DPDK were all done manually
today and the same shall be automated during introspection. This hardware
detection also provides the operator with the data needed for configuring
Heat templates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As of today its not possible to have the co-existence of compute nodes with
DPDK enabled hardware and without DPDK enabled hardware.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Ironic Python Agent shall discover the below hardware details and store it
in swift blob -&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CPU flags for hugepages support -
If pse exists then 2MB hugepages are supported
If pdpe1gb exists then 1GB hugepages are supported&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU flags for IOMMU -
If VT-d/svm exists, then IOMMU is supported, provided IOMMU support is
enabled in BIOS.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compatible nics -
Shall compare it with the list of NICs whitelisted for DPDK. The DPDK
supported NICs are available at &lt;a class="reference external" href="http://dpdk.org/doc/nics"&gt;http://dpdk.org/doc/nics&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The nodes without any of the above mentioned capabilities can’t be used for
COMPUTE role with DPDK.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Operator shall have a provision to enable DPDK on compute nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The overcloud image for the nodes identified to be COMPUTE capable and having
DPDK NICs, shall have the OVS+DPDK package instead of OVS. It shall also have
packages dpdk and driverctl.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The device names of the DPDK capable NIC’s shall be obtained from T-H-T.
The PCI address of DPDK NIC needs to be identified from the device name.
It is required for whitelisting the DPDK NICs during PCI probe.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hugepages needs to be enabled in the Compute nodes with DPDK.
Bug: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1589929"&gt;https://bugs.launchpad.net/tripleo/+bug/1589929&lt;/a&gt; needs to be implemeted&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU isolation needs to be done so that the CPU cores reserved for DPDK Poll
Mode Drivers (PMD) are not used by the general kernel balancing,
interrupt handling and scheduling algorithms.
Bug: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1589930"&gt;https://bugs.launchpad.net/tripleo/+bug/1589930&lt;/a&gt; needs to be implemented.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On each COMPUTE node with DPDK enabled NIC, puppet shall configure the
DPDK_OPTIONS for whitelisted NIC’s, CPU mask and number of memory channels
for DPDK PMD. The DPDK_OPTIONS needs to be set in /etc/sysconfig/openvswitch&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Os-net-config shall -&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Associate the given interfaces with the dpdk drivers (default as vfio-pci
driver) by identifying the pci address of the given interface. The
driverctl shall be used to bind the driver persistently&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Understand the ovs_user_bridge and ovs_dpdk_port types and configure the
ifcfg scripts accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The “TYPE” ovs_user_bridge shall translate to OVS type OVSUserBridge and
based on this OVS will configure the datapath type to ‘netdev’.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The “TYPE” ovs_dpdk_port shall translate OVS type OVSDPDKPort and based on
this OVS adds the port to the bridge with interface type as ‘dpdk’&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Understand the ovs_dpdk_bond and configure the ifcfg scripts accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On each COMPUTE node with DPDK enabled NIC, puppet shall -&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Enable OVS+DPDK in /etc/neutron/plugins/ml2/openvswitch_agent.ini
[OVS]
datapath_type=netdev
vhostuser_socket_dir=/var/run/openvswitch&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure vhostuser ports in /var/run/openvswitch to be owned by qemu.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On each controller node, puppet shall -&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add NUMATopologyFilter to scheduler_default_filters in nova.conf.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The boot parameters could be configured via puppet (during overcloud
deployment) as well as virt-customize (after image building or downloading).
The choice of selection of boot parameter is moved out of scope of this
blueprint and would be tracked via
&lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1589930"&gt;https://bugs.launchpad.net/tripleo/+bug/1589930&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We have no firewall drivers which support ovs-dpdk at present. Security group
support with conntrack is a possible option, and this work is in progress.
Security groups will not be supported.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;OVS-DPDK can augment 3 times dataplane performance.
Refer &lt;a class="reference external" href="http://goo.gl/Du1EX2"&gt;http://goo.gl/Du1EX2&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The operator shall ensure that the VT-d/IOMMU virtualization technology is
enabled in BIOS of the compute nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Post deployment, operator shall modify the VM flavors for using hugepages,
CPU pinning
Ex: nova flavor-key m1.small set “hw:mem_page_size=large”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;karthiks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;sanjayu&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The proposed changes discussed earlier will be the work items&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We are dependent on composable roles, as this is something we would
require only on specific compute nodes and not generally on all the nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To enable Hugepages, bug: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1589929"&gt;https://bugs.launchpad.net/tripleo/+bug/1589929&lt;/a&gt;
needs to be implemeted&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To address boot parameter changes for CPU isolation,
bug: &lt;a class="reference external" href="https://bugs.launchpad.net/tripleo/+bug/1589930"&gt;https://bugs.launchpad.net/tripleo/+bug/1589930&lt;/a&gt; needs to be implemented&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Since DPDK needs specific hardware support, this feature cant be tested under
CI. We will need third party CI for validating it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Manual steps that needs to be done by the operator shall be documented.
Ex: configuring BIOS for VT-d, adding boot parameter for CPU isolation,
hugepages, post deploymenent configurations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="refrences"&gt;
&lt;h2&gt;Refrences&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Manual steps to setup DPDK in RedHat Openstack Platform 8
&lt;a class="reference external" href="https://goo.gl/6ymmJI"&gt;https://goo.gl/6ymmJI&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Setup procedure for CPU pinning and NUMA topology
&lt;a class="reference external" href="http://goo.gl/TXxuhv"&gt;http://goo.gl/TXxuhv&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DPDK supported NICS
&lt;a class="reference external" href="http://dpdk.org/doc/nics"&gt;http://dpdk.org/doc/nics&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
</description><pubDate>Sat, 07 May 2016 00:00:00 </pubDate></item><item><title>Adding SR-IOV to Tripleo</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/tripleo-sriov.html</link><description>&lt;p&gt;This work is licensed under a Creative Commons Attribution 3.0 Unported
License.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;http://creativecommons.org/licenses/by/3.0/legalcode&lt;/a&gt;&lt;/p&gt;
&lt;section id="adding-sr-iov-to-tripleo"&gt;
 
&lt;dl class="simple"&gt;
&lt;dt&gt;Blueprint URL:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-sriov"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-sriov&lt;/a&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;SR-IOV is a specification that extends the PCI Express specification and allows
a PCIe device to appear to be multiple separate physical PCIe devices.&lt;/p&gt;
&lt;p&gt;SR-IOV provides one or more Virtual Functions (VFs) and a Physical Function(PF)&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Virtual Functions (VF’s) are ‘lightweight’ PCIe functions that contain the
resources necessary for data movement but have a carefully minimized set
of configuration resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Physical Function (PF) are full PCIe functions that include the SR-IOV
Extended Capability. This capability is used to configure and manage
the SR-IOV functionality.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The VF’s could be attached to VMs like a dedicated PCIe device and thereby the
usage of SR-IOV NICs boosts the networking performance considerably.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Today the installation and configuration of SR-IOV feature is done manually
after overcloud deployment. It shall be automated via tripleo.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Identification of the hardware capabilities for SR-IOV were all done manually
today and the same shall be automated during introspection. The hardware
detection also provides the operator, the data needed for configuring Heat
templates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Ironic Python Agent will discover the below hardware details and store it in
swift blob -&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;SR-IOV capable NICs:
Shall read /sys/bus/pci/devices/…/sriov_totalvfs and check if its non
zero, inorder to identify if the NIC is SR-IOV capable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VT-d or IOMMU support in BIOS:
The CPU flags shall be read to identify the support.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DIB shall include the package by default - openstack-neutron-sriov-nic-agent.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The nodes without any of the above mentioned capabilities can’t be used for
COMPUTE role with SR-IOV&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SR-IOV drivers shall be loaded during bootup via persistent module loading
scripts. These persistent module loading scripts shall be created by the
puppet manifests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;T-H-T shall provide the below details&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;supported_pci_vendor_devs - configure the vendor-id/product-id couples in
the nodes running neutron-server&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;max number of vf’s - persistent across reboots&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;physical device mappings - Add physical device mappings ml2_conf_sriov.ini
file in compute node&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On the nodes running the Neutron server, puppet shall&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;enable sriovnicswitch in the /etc/neutron/plugin.ini file
mechanism_drivers = openvswitch,sriovnicswitch
This configuration enables the SR-IOV mechanism driver alongside
OpenvSwitch.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Set the VLAN range for SR-IOV in the file /etc/neutron/plugin.ini, present
in the network node
network_vlan_ranges = &amp;lt;physical network name SR-IOV interface&amp;gt;:&amp;lt;VLAN min&amp;gt;
:&amp;lt;VLAN max&amp;gt; Ex :  network_vlan_ranges = fabric0:15:20&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure the vendor-id/product-id couples if it differs from
“15b3:1004,8086:10ca” in /etc/neutron/plugins/ml2/ml2_conf_sriov.ini
supported_pci_vendor_devs = 15b3:1004,8086:10ca,&amp;lt;vendor-id:product-id&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure neutron-server.service to use the ml2_conf_sriov.ini file
[Service] Type=notify User=neutron ExecStart=/usr/bin/neutron-server
–config-file /usr/share/neutron/neutron-dist.conf –config-file
/etc/neutron/neutron.conf –config-file /etc/neutron/plugin.ini
–config-file /etc/neutron/plugins/ml2/ml2_conf_sriov.ini  –log-file
/var/log/neutron/server.log&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the nodes running nova scheduler, puppet shall&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;add PciPassthroughFilter filter to the list of scheduler_default_filters.
This needs to be done to allow proper scheduling of SR-IOV devices&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On each COMPUTE+SRIOV node, puppet shall configure /etc/nova/nova.conf&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Associate the available VFs with each physical network
Ex: pci_passthrough_whitelist={“devname”: “enp5s0f1”,
“physical_network”:”fabric0”}&lt;/p&gt;
&lt;p&gt;PCI passthrough whitelist entries use the following syntax: [“device_id”:
“&amp;lt;id&amp;gt;”,] [“product_id”: “&amp;lt;id&amp;gt;”,] [“address”:
“[[[[&amp;lt;domain&amp;gt;]:]&amp;lt;bus&amp;gt;]:][&amp;lt;slot&amp;gt;][.[&amp;lt;function&amp;gt;]]” | “devname”: “Ethernet
Interface Name”,] “physical_network”:”Network label string”&lt;/p&gt;
&lt;p&gt;VF’s that needs to be excluded from agent configuration shall be added in
[sriov_nic]/exclude_devices. T-H-T shall configure this.&lt;/p&gt;
&lt;p&gt;Multiple whitelist entries per host are supported.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Puppet shall&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Setup max number of VF’s to be configured by the operator
echo required_max_vfs &amp;gt; /sys/bus/pci/devices/…/sriov_numvfs
Puppet will also validate the required_max_vfs, so that it does not go
beyond the supported max on the device.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable NoopFirewallDriver in the
‘/etc/neutron/plugins/ml2/sriov_agent.ini’ file.&lt;/p&gt;
&lt;p&gt;[securitygroup]
firewall_driver = neutron.agent.firewall.NoopFirewallDriver&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add mappings to the /etc/neutron/plugins/ml2/sriov_agent.ini file.  Ex:
physical_device_mappings = fabric0:enp4s0f1
In this example, fabric0 is the physical network, and enp4s0f1 is the
physical function&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Puppet shall start the SR-IOV agent on Compute&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;systemctl enable  neutron-sriov-nic-agent.service&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;systemctl start neutron-sriov-nic-agent.service&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We have no firewall drivers which support SR-IOV at present.
Security groups will be disabled only for SR-IOV ports in compute hosts.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;SR-IOV provides near native I/O performance for each virtual machine on a
physical server. Refer - &lt;a class="reference external" href="http://goo.gl/HxZvXX"&gt;http://goo.gl/HxZvXX&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The operator shall ensure that the BIOS supports VT-d/IOMMU virtualization
technology on the compute nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;IOMMU needs to be enabled in the Compute+SR-IOV nodes. Boot parameters
(intel_iommu=on or  amd_iommu=pt) shall be added in the grub.conf, using the
first boot scripts (THT).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Post deployment, operator shall&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Create neutron ports prior to creating VM’s (nova boot)
neutron port-create fabric0_0 –name sr-iov –binding:vnic-type direct&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create the VM with the required flavor and SR-IOV port id
Ex: nova boot –flavor m1.small –image &amp;lt;image id&amp;gt; –nic port-id=&amp;lt;port id&amp;gt;
vnf0&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;karthiks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;sanjayu&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Documented above in the Proposed changes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We are dependent on composable roles as SR-IOV specific changes is something
we would require on specific compute nodes and not generally on all the
nodes. Blueprint -
&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles"&gt;https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Since SR-IOV needs specific hardware support, this feature cant be tested
under CI. We will need third party CI for validating it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Manual steps that needs to be done by the operator shall be documented.
Ex: configuring BIOS for VT-d, IOMMU, post deploymenent configurations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="refrences"&gt;
&lt;h2&gt;Refrences&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;SR-IOV support for virtual networking
&lt;a class="reference external" href="https://goo.gl/eKP1oO"&gt;https://goo.gl/eKP1oO&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable SR-IOV functionality available in OpenStack
&lt;a class="reference external" href="http://docs.openstack.org/liberty/networking-guide/adv_config_sriov.html"&gt;http://docs.openstack.org/liberty/networking-guide/adv_config_sriov.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introduction to SR-IOV
&lt;a class="reference external" href="http://goo.gl/m7jP3"&gt;http://goo.gl/m7jP3&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Setup procedure for CPU pinning and NUMA topology
&lt;a class="reference external" href="http://goo.gl/TXxuhv"&gt;http://goo.gl/TXxuhv&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;/sys/bus/pci/devices/…/sriov_totalvfs - This file appears when a physical
PCIe device supports SR-IOV.
&lt;a class="reference external" href="https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci"&gt;https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
</description><pubDate>Sat, 07 May 2016 00:00:00 </pubDate></item><item><title>Refactor top level puppet manifests</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/refactor-puppet-manifests.html</link><description>
 
&lt;p&gt;Launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests"&gt;https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The current overcloud controller puppet manifests duplicate a large amount
of code between the pacemaker (HA) and non-ha version. We can reduce the
effort required to add new features by refactoring this code, and since
there is already a puppet-tripleo module this is the logical destination.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Large amounts of puppet/manifests/overcloud_controller.pp are shared with
puppet/manifests/overcloud_controller_pacemaker.pp. When adding a feature
or fixing a mistake in the former, it is frequently also an issue in the
latter. It is a violation of the common programming principle of DRY, which
while not an inviolable rule, is usually considered good practice.&lt;/p&gt;
&lt;p&gt;In addition, moving this code into separate classes in another module will
make it simpler to enable/disable components, as it will be a matter of
merely controlling which classes (profiles) are included.&lt;/p&gt;
&lt;p&gt;Finally, it allows easier experimentation with modifying the ‘ha strategy’.
Currently this is done using ‘step’, but could in theory be done using a
service registry. By refactoring into ha+non-ha classes this would be quite
simple to swap in/out.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;While there are significant differences in ha and non-ha deployments, in almost
all cases the ha code will be a superset of the non-ha. A simple example of
this is at the top of both files, where the load balancer is handled. The non
ha version simply includes the loadbalancing class, while the HA version
instantiates the exact same class but with some parameters changed. Across
the board the same classes are included for the openstack services, but with
manage service set to false in the HA case.&lt;/p&gt;
&lt;p&gt;I propose first breaking up the non-ha version into profiles which can reside
in puppet-tripleo/manifests/profile/nonha, then adding ha versions which
use those classes under puppet-tripleo-manifests/profile/pacemaker. Pacemaker
could be described as an ‘ha strategy’ which in theory should be replaceable.
For this reason we use a pacemaker subfolder since one day perhaps we’ll have
an alternative.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We could leave things as they are, which works and isn’t the end of the world,
but it’s probably not optimal.&lt;/p&gt;
&lt;p&gt;We could use kolla or something that removes the need for puppet entirely, but
this discussion is outside the scope of this spec.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;It will make downstreams happy since they can sub in/out classes more easily.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Adding wrapper classes isn’t going to impact puppet compile times very much.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Changes in t-h-t and puppet-tripleo will often be coupled, as t-h-t
defines the data on which puppet-tripleo depends on.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;michaeltchapman&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Move overcloud controller to profile classes
Move overcloud controller pacemaker to profile classes
Move any other classes from the smaller manifests in t-h-t&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;No new features so current tests apply in their entirety.
Additional testing can be added for each profile class&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 01 Mar 2016 00:00:00 </pubDate></item><item><title>Library support for TripleO Overcloud Deployment Via Mistral</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-mistral-deployment-library.html</link><description>
 
&lt;p&gt;We need a TripleO library that supports the overcloud deployment workflow.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;TripleO has an overcloud deployment workflow that uses Heat templates and uses
the following steps:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The user edits the templates and environment file.  These can be stored
anywhere.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Templates may be validated by Heat.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Templates and environment are sent to Heat for overcloud deployment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This workflow is already supported by the CLI.&lt;/p&gt;
&lt;p&gt;However from a GUI perspective, although the workflow is straightforward, it is
not simple.  Here are some of the complications that arise:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Some of the business logic in this workflow is contained in the CLI itself,
making it difficult for other UIs to use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the TripleO overcloud deployment workflow changes, it is easy for the CLI
and GUI approach to end up on divergent paths - a dangerous situation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The CLI approach allows open-ended flexibility (the CLI doesn’t care where
the templates come from) that is detrimental for a GUI (the GUI user doesn’t
care where the templates are stored, but consistency in approach is desirable
to prevent divergence among GUIs and CLIs).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is a need to create common code that accommodates the flexibility of the
CLI with the ease-of-use needs of GUI consumers.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;In order to solve this problem, we propose to create a Mistral-integrated
deployment with the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Encapsulate the business logic involved in the overcloud deployment workflow
within the tripleo-common library utilizing Mistral actions and workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide a simplified workflow to hide unneeded complexity from GUI consumers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the CLI to use this code where appropriate to prevent divergence with
GUIs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first three points deserve further explanation.  First, let us lay out the
proposed GUI workflow.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;A user pushes the Heat deployment templates into swift.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The user defines values for the template resource types given by Heat
template capabilities which are stored in an environment[1]. Note that this
spec will be completed by mitaka at the earliest.  A workaround is discussed
below.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Now that the template resource types are specified, the user can configure
deployment parameters given by Heat.  Edited parameters are updated and are
stored in an environment.  ‘Roles’ can still be derived from available Heat
parameters[2].&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Steps 2 and 3 can be repeated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With configuration complete, the user triggers the deployment of the
overcloud.  The templates and environment file are taken from Swift
and sent to Heat.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once overcloud deployment is complete, any needed post-deploy config is
performed.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The CLI and GUI will both use the Swift workflow and store the templates into
Swift.  This would facilitate the potential to switch to the UI from a CLI based
deployment and vice-versa.&lt;/p&gt;
&lt;p&gt;Mistral Workflows are composed of Tasks, which group together one or more
Actions to be executed with a Workflow Execution.  The Action is implemented as
a class with an initialization method and a run method.  The run method provides
a single execution point for Python code.  Any persistence of state required for
Actions or Workflows will be stored in a Mistral Environment object.&lt;/p&gt;
&lt;p&gt;In some cases, an OpenStack Service may be missing a feature needed for TripleO
or it might only be accessible through its associated Python client.  To
mitigate this issue in the short term, some of the Actions will need to be
executed directly with an Action Execution [3] which calls the Action directly and
returns instantly, but also doesn’t have access to the same context as a
Workflow Execution.  In theory, every action execution should be replaced by an
OpenStack service API call.&lt;/p&gt;
&lt;p&gt;Below is a summary of the intended Workflows and Actions to be executed from the
CLI or the GUI using the python-mistralclient or Mistral API.  There may be
additional actions or library code necessary to enable these operations that
will not be intended to be consumed directly.&lt;/p&gt;
&lt;p&gt;Workflows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Node Registration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Node Introspection&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Plan Creation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Plan Deletion&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Validation Operations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Actions:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Plan List&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get Capabilites&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update Capabilities&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get Parameters&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update Parameters&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Roles List&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;For Flavors and Image management, the Nova and Glance APIs will be used
respectively.&lt;/p&gt;
&lt;p&gt;The registration and introspection of nodes will be implemented within a
Mistral Workflow.  The logic is currently in tripleoclient and will be ported,
as certain node configurations are specified as part of the logic (ramdisk,
kernel names, etc.) so the user does not have to specify those.  Tagging,
listing and deleting nodes will happen via the Ironic/Inspectors APIs as
appropriate.&lt;/p&gt;
&lt;p&gt;A deployment plan consists of a collection of heat templates in a Swift
container, combined with data stored in a Mistral Environment.  When the plan is
first created, the capabilities map data will be parsed and stored in the
associated Mistral Environment.  The templates will need to be uploaded to a
Swift container with the same name as the stack to be created.  While any user
could use a raw POST request to accomplish this, the GUI and CLI will provide
convenience functions improve the user experience.  The convenience functions
will be implemented in an Action that can be used directly or included in a
Workflow.&lt;/p&gt;
&lt;p&gt;The deletion of a plan will be implemented in a Workflow to ensure there isn’t
an associated stack before deleting the templates, container and Mistral
Environment.  Listing the plans will be accomplished by calling
‘mistral environment-list’.&lt;/p&gt;
&lt;p&gt;To get a list of the available Heat environment files with descriptions and
constraints, the library will have an Action that returns the information about
capabilities added during plan creation and identifies which Heat environment
files have already been selected.  There will also be an action that accepts a
list of user selected Heat environment files and stores the information in the
Mistral Environment.  It would be inconvenient to use a Workflow for these
actions as they just read or update the Mistral Environment and do not require
additional logic.&lt;/p&gt;
&lt;p&gt;The identification of Roles will be implemented in a Workflow that calls out to
Heat.&lt;/p&gt;
&lt;p&gt;To obtain the deployment parameters, Actions will be created that will call out
to heat with the required template information to obtain the parameters and set
the parameter values to the Environment.&lt;/p&gt;
&lt;p&gt;To perform TripleO validations, Workflows and associated Actions will be created
to support list, start, stop, and results operations.  See the spec [4] for more
information on how the validations will be implemented with Mistral.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;One alternative is to force non-CLI UIs to re-implement the business logic
currently contained within the CLI.  This is not a good alternative.  Another
possible alternative would be to create a REST API [5] to abstract TripleO
deployment logic, but it would require considerably more effort to create and
maintain and has been discussed at length on the mailing list. [6][7]&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The –templates workflow will end up being modified to use the updated
tripleo-common library.&lt;/p&gt;
&lt;p&gt;Integrating with Mistral is a straightforward process and this may result in
increased usage.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Rather than write workflow code in python-tripleoclient directly developers will
now create Mistral Actions and Workflows that help implement the requirements.&lt;/p&gt;
&lt;p&gt;Right now, changing the overcloud deployment workflow results in stress due to
the need to individually update both the CLI and GUI code.  Converging the two
makes this a far easier proposition.  However developers will need to have this
architecture in mind and ensure that changes to the –templates or –plan
workflow are maintained in the tripleo-common library (when appropriate) to
avoid unneeded divergences.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;rbrady&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;jtomasek&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;dprince&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;The work items required are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Develop the tripleo-common Mistral actions that provide all of the
functionality required for our deployment workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This involves moving much of the code out of python-tripleoclient and into
generic, narrowly focused, Mistral actions that can be consumed via the
Mistral API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create new Mistral workflows to help with high level things like deployment,
introspection, node registration, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;tripleo-common is more of an internal library, and its logic is meant to be
consumed (almost) solely by using Mistral
actions. Projects should not attempt to circumvent the API by using
tripleo-common as a library as much as possible.
There may be some exceptions to this for common polling functions, etc. but in
general all core workflow logic should be API driven.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the CLI to consume these Mistral actions directly via
python-mistralclient.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All patches that implement these changes must pass CI and add additional tests
as needed.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The TripleO CI should be updated to test the updated tripleo-common library.&lt;/p&gt;
&lt;p&gt;Our intent is to make tripleoclient consume Mistral actions as we write them.
Because all of the existing upstream Tripleo CI release on tripleoclient taking
this approach ensures that our all of our workflow actions always work. This
should get us coverage on 90% of the Mistral actions and workflows and allow us
to proceed with the implementation iteratively/quickly. Once the UI is installed
and part of our upstream CI we can also rely on coverage there to ensure we
don’t have breakages.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Mistral Actions and Workflows are sort of self-documenting and can be easily
introspected by running ‘mistral workflow-list’ or ‘mistral action-list’ on the
command line.  The updated library however will have to be well-documented and
meet OpenStack standards.  Documentation will be needed in both the
tripleo-common and tripleo-docs repositories.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://specs.openstack.org/openstack/heat-specs/specs/mitaka/resource-capabilities.html"&gt;https://specs.openstack.org/openstack/heat-specs/specs/mitaka/resource-capabilities.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[2] &lt;a class="reference external" href="https://specs.openstack.org/openstack/heat-specs/specs/liberty/nested-validation.html"&gt;https://specs.openstack.org/openstack/heat-specs/specs/liberty/nested-validation.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[3] &lt;a class="reference external" href="http://docs.openstack.org/developer/mistral/terminology/executions.html"&gt;http://docs.openstack.org/developer/mistral/terminology/executions.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[4] &lt;a class="reference external" href="https://review.openstack.org/#/c/255792/"&gt;https://review.openstack.org/#/c/255792/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[5] &lt;a class="reference external" href="http://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-overcloud-deployment-library.html"&gt;http://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-overcloud-deployment-library.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[6] &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-January/083943.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2016-January/083943.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[7] &lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2016-January/083757.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2016-January/083757.html&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Mon, 15 Feb 2016 00:00:00 </pubDate></item><item><title>TripleO Quickstart</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-quickstart.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-quickstart"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-quickstart&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We need a common way for developers/CI systems to quickly stand up a virtual
environment.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The tool we currently document for this use case is instack-virt-setup.
However this tool has two major issues, and some missing features:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;There is no upstream CI using it. This means we have no way to test changes
other than manually. This is a huge barrier to adding the missing features.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It relies on a maze of bash scripts in the incubator repository[1] in order
to work. This is a barrier to new users, as it can take quite a bit of time
to find and then navigate that maze.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It has no way to use a pre-built undercloud image instead of starting from
scratch and redoing the same work that CI and every other tripleo developer
is doing on every run. Starting from a pre-built undercloud with overcloud
images prebaked can be a significant time savings for both CI systems as well
as developer test environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It has no way to create this undercloud image either.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There are other smaller missing features like automatically tagging the fake
baremetals with profile capability tags via instackenv.json. These would not
be too painful to implement, but without CI even small changes carry some
amount of pain.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Import the tripleo-quickstart[2] tool that RDO is using for this purpose.
This project is a set of ansible roles that can be used to build an
undercloud.qcow2, or alternatively to consume it. It was patterned after
instack-virt-setup, and anything configurable via instack-virt-setup is
configurable in tripleo-quickstart.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use third-party CI for self-gating this new project. In order to setup an
environment similar to how developers and users can use this tool, we need
a baremetal host. The CI that currently self gates this project is setup on
ci.centos.org[3], and setting this up as third party CI would not be hard.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;One alternative is to keep using instack-virt-setup for this use case.
However, we would still need to add CI for instack-virt-setup. This would
still need to be outside of tripleoci, since it requires a baremetal host.
Unless someone is volunteering to set that up, this is not really a viable
alternative.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Similarly, we could use some other method for creating virtual environments.
However, this alternative is similarly constrained by needing third-party CI
for validation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Using a pre-built undercloud.qcow2 drastically symplifies the virt-setup
instructions, and therefore is less error prone. This should lead to a better
new user experience of TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Using a pre-built undercloud.qcow2 will shave 30+ minutes from the CI
gate jobs.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;There is no reason this same undercloud.qcow2 could not be used to deploy
real baremetal environments. There have been many production deployments of
TripleO that have used a VM undercloud.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;The undercloud.qcow2 approach makes it much easier and faster to reproduce
exactly what is run in CI. This leads to a much better developer experience.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;trown&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Import the existing work from the RDO community to the openstack namespace
under the TripleO umbrella.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Setup third-party CI running in ci.centos.org to self-gate this new project.
(We can just update the current CI[3] to point at the new upstream location)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation will need to be updated for the virtual environment setup.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Currently, the only undercloud.qcow2 available is built in RDO. We would
either need to build one in tripleo-ci, or use the one built in RDO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We need a way to CI the virtual environment setup. This is not feasible within
tripleoci, since it requires a baremetal host machine. We will need to rely on
third party CI for this.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Overall this will be a major simplification of the documentation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://github.com/openstack/tripleo-incubator/tree/master/scripts"&gt;https://github.com/openstack/tripleo-incubator/tree/master/scripts&lt;/a&gt;
[2] &lt;a class="reference external" href="https://github.com/redhat-openstack/tripleo-quickstart"&gt;https://github.com/redhat-openstack/tripleo-quickstart&lt;/a&gt;
[3] &lt;a class="reference external" href="https://ci.centos.org/view/rdo/job/tripleo-quickstart-gate-mitaka-delorean-minimal/"&gt;https://ci.centos.org/view/rdo/job/tripleo-quickstart-gate-mitaka-delorean-minimal/&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Fri, 05 Feb 2016 00:00:00 </pubDate></item><item><title>TripleO Deployment Validations</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/validations.html</link><description>
 
&lt;p&gt;We need ways in TripleO for performing validations at various stages of the
deployment.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;TripleO deployments, and more generally all OpenStack deployments, are complex,
error prone, and highly dependent on the environment. An appropriate set of
tools can help engineers to identify potential problems as early as possible
and fix them before going further with the deployment.&lt;/p&gt;
&lt;p&gt;People have already developed such tools [1], however they appear more like
a random collection of scripts than a well integrated solution within TripleO.
We need to expose the validation checks from a library so they can be consumed
from the GUI or CLI without distinction and integrate flawlessly within TripleO
deployment workflow.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;We propose to extend the TripleO Overcloud Deployment Mistral workflow [2] to
include Actions for validation checks.&lt;/p&gt;
&lt;p&gt;These actions will need at least to:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;List validations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run and stop validations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get validation status&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Persist and retrieve validation results&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Permit grouping validations by ‘deployment stage’ and execute group operations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Running validations will be implemented in a workflow to ensure the nodes meet
certain expectations. For example, a baremetal validation may require the node
to boot on a ramdisk first.&lt;/p&gt;
&lt;p&gt;Mistral workflow execution can be started with the &lt;cite&gt;mistral execution-create&lt;/cite&gt;
command and can be stopped with the &lt;cite&gt;mistral execution-update&lt;/cite&gt; command by
setting the workflow status to either SUCCESS or ERROR.&lt;/p&gt;
&lt;p&gt;Every run of the workflow (workflow execution) is stored in Mistral’s DB and
can be retrieved for later use. The workflow execution object contains all
information about the workflow and its execution, including all output data and
statuses for all the tasks composing the workflow.&lt;/p&gt;
&lt;p&gt;By introducing a reasonable validation workflows naming, we are able to use
workflow names to identify stage at which the validations should run and
trigger all validations of given stage (e.g.
tripleo.validation.hardware.undercloudRootPartitionDiskSizeCheck)&lt;/p&gt;
&lt;p&gt;Using the naming conventions, the user is also able to register a new
validation workflow and add it to the existing ones.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;One alternative is to ship a collection of scripts within TripleO to be run by
engineers at different stages of the deployment. This solution is not optimal
because it requires a lot of manual work and does not integrate with the UI.&lt;/p&gt;
&lt;p&gt;Another alternative is to build our own API, but it would require significantly
more effort to create and maintain. This topic has been discussed at length on
the mailing list.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The whole point behind the validations framework is to permit running scripts
on the nodes, thus providing access from the control node to the deployed nodes
at different stages of the deployment. Special care needs to be taken to grant
access to the target nodes using secure methods and ensure only trusted scripts
can be executed from the library.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;We expect reduced deployment time thanks to early issue detection.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers will need to keep the TripleO CI updated with changes, and will be
responsible for fixing the CI as needed.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;shadower&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;mandre&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;The work items required are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Develop the tripleo-common Mistral actions that provide all of the
functionality required for the validation workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write an initial set of validation checks based on real deployment
experience, starting by porting existing validations [1] to work with the
implemented Mistral actions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All patches that implement these changes must pass CI and add additional tests as
needed.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;We are dependent upon the tripleo-mistral-deployment-library [2] work.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The TripleO CI should be updated to test the updated tripleo-common library.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Mistral Actions and Workflows are sort of self-documenting and can be easily
introspected by running ‘mistral workflow-list’ or ‘mistral action-list’ on the
command line.  The updated library however will have to be well-documented and
meet OpenStack standards.  Documentation will be needed in both the
tripleo-common and tripleo-docs repositories.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;[1] Set of tools to help detect issues during TripleO deployments:
&lt;a class="reference external" href="https://github.com/rthallisey/clapper"&gt;https://github.com/rthallisey/clapper&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[2] Library support for TripleO Overcloud Deployment Via Mistral:
&lt;a class="reference external" href="https://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-mistral-deployment-library.html"&gt;https://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-mistral-deployment-library.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Thu, 10 Dec 2015 00:00:00 </pubDate></item><item><title>TripleO UI</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-ui.html</link><description>
 
&lt;p&gt;We need a graphical user interface that will support deploying OpenStack using
TripleO.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Tuskar-UI, the only currently existing GUI capable of TripleO deployments, has
several significant issues.&lt;/p&gt;
&lt;p&gt;Firstly, its back-end relies on an obsolete version of the Tuskar API, which is
insufficient for complex overcloud deployments.&lt;/p&gt;
&lt;p&gt;Secondly, it is implemented as a Horizon plugin and placed under the Horizon
umbrella, which has proven to be suboptimal, for several reasons:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The placement under the Horizon program. In order to be able to develop the
Tuskar-UI, one needs deep familiarity with both Horizon and TripleO projects.
Furthermore, in order to be able to approve patches, one needs to be a
Horizon core reviewer. This restriction reduces the number of people who can
contribute drastically, as well as makes it hard for Tuskar-UI developers to
actually land code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The complexity of the Horizon Django application. Horizon is a very complex
heavyweight application comprised of many OpenStack services. It has become
very large, inflexible and consists of several unnecessary middle layers. As
a result of this, we have been witnessing the emergence of several new GUIs
implemented as independent (usually fully client-side JavaScript) applications,
rather than as Horizon plugins. Ironic webclient[1] is one such example. This
downside of Horizon has been recognized and an attempt to address it is
described in the next point.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The move to Angular JS (version 1). In an attempt to address the issues listed
above, the Horizon community decided to rewrite it in Angular JS. However,
instead of doing a total rewrite, they opted for a more gradual approach,
resulting in even more middle layers (the original Django layer turned into an
API for Angular based front end). Although the intention is to eventually
get rid of the unwanted layers, the move is happening very slowly. In
addition, this rewrite of Horizon is to AngularJS version 1, which may soon
become obsolete, with version 2 just around the corner. This probably means
another complete rewrite in not too distant future.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Packaging issues. The move to AngularJS brought along a new set of issues
related to the poor state of packaging of nodejs based tooling in all major
Linux distributions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;In order to address the need for a TripleO based GUI, while avoiding the issues
listed above, we propose introducing a new GUI project, &lt;em&gt;TripleO UI&lt;/em&gt;, under the
TripleO program.&lt;/p&gt;
&lt;p&gt;As it is a TripleO specific UI, TripleO GUI will be placed under the TripleO
program, which will bring it to attention of TripleO reviewers and allow
TripleO core reviewers to approve patches. This should facilitate the code
contribution process.&lt;/p&gt;
&lt;p&gt;TripleO UI will be a web UI designed for overcloud deployment and
management. It will be a lightweight, independent client-side application,
designed for flexibility, adaptability and reusability.&lt;/p&gt;
&lt;p&gt;TripleO UI will be a fully client-side JavaScript application. It will be
stateless and contain no business logic. It will consume the TripleO REST API[2],
which will expose the overcloud deployment workflow business logic implemented
in the tripleo-common library[3]. As opposed to the previous architecture which
included many unwanted middle layers, this one will be very simple, consisting
only of the REST API serving JSON, and the client-side JavaScript application
consuming it.&lt;/p&gt;
&lt;p&gt;The development stack will consist of ReactJS[4] and Flux[5]. We will use ReactJS
to implement the web UI components, and Flux for architecture design.&lt;/p&gt;
&lt;p&gt;Due to the packaging problems described above, we will not provide any packages
for the application for now. We will simply make the code available for use.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The alternative is to keep developing Tuskar-UI under the Horizon umbrella. In
addition to all the problems outlined above, this approach would also mean a
complete re-write of Tuskar-UI back-end to make it use the new tripleo-common
library.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;This proposal introduces a brand new application; all the standard security
concerns which come with building a client-side web application apply.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;We plan to build a standalone web UI which will be capable of deploying
OpenStack with TripleO. Since as of now no such GUIs exist, this can be a huge
boost for adoption of TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The proposed technology stack, ReactJS and Flux, have excellent performance
characteristics. TripleO UI should be a lightweight, fast, flexible application.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Right now, development on Tuskar-UI is uncomfortable for the reasons
detailed above. This proposal should result in more comfortable development
as it logically places TripleO UI under the TripleO program, which brings
it under the direct attention of TripleO developers and core reviewers.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;jtomasek&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;flfuchs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;jrist&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&amp;lt;TBD person with JS &amp;amp; CI skills&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;This is a general proposal regarding the adoption of a new graphical user
interface under the TripleO program. The implementation of specific features
will be covered in subsequent proposals.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;We are dependent upon the creation of the TripleO REST API[2], which in turn
depends on the tripleo-common[3] library containing all the functionality
necessary for advanced overcloud deployment.&lt;/p&gt;
&lt;p&gt;Alternatively, using Mistral to provide a REST API, instead of building a new
API, is currently being investigated as another option.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;TripleO UI should be thoroughly tested, including unit tests and integration
tests. Every new feature and bug fix should be accompanied by appropriate tests.&lt;/p&gt;
&lt;p&gt;The TripleO CI should be updated to test the TripleO UI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;TripleO UI will have to be well-documented and meet OpenStack standards.
We will need both developer and deployment documentation. Documentation will
live in the tripleo-docs repository.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://github.com/openstack/ironic-webclient"&gt;https://github.com/openstack/ironic-webclient&lt;/a&gt;
[2] &lt;a class="reference external" href="https://review.openstack.org/#/c/230432"&gt;https://review.openstack.org/#/c/230432&lt;/a&gt;
[3] &lt;a class="reference external" href="http://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-overcloud-deployment-library.html"&gt;http://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-overcloud-deployment-library.html&lt;/a&gt;
[4] &lt;a class="reference external" href="https://facebook.github.io/react/"&gt;https://facebook.github.io/react/&lt;/a&gt;
[5] &lt;a class="reference external" href="https://facebook.github.io/flux/"&gt;https://facebook.github.io/flux/&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Fri, 23 Oct 2015 00:00:00 </pubDate></item><item><title>Workflow Simplification</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/newton/workflow-simplification.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/workflow-simplification"&gt;https://blueprints.launchpad.net/tripleo/+spec/workflow-simplification&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The TripleO workflow is still too complex for many (most?) users to follow
successfully.  There are some fairly simple steps we can take to improve
that situation.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The current TripleO workflow grew somewhat haphazardly out of a collection
of bash scripts that originally made up instack-undercloud.  These scripts
started out life as primarily a proof of concept exercise to demonstrate
that the idea was viable, and while the steps still work fine when followed
correctly, it seems “when followed correctly” is too difficult today, at least
based on the feedback I’m hearing from users.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;There seem to be a number of low-hanging fruit candidates for cleanup.  In the
order in which they appear in the docs, these would be:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Node registration&lt;/strong&gt; Why is this two steps?  Is there ever a case where we
would want to register a node but not configure it to be able to boot?
If there is, is it a significant enough use case to justify the added
step every time a user registers nodes?&lt;/p&gt;
&lt;p&gt;I propose that we configure boot on newly registered nodes automatically.
Note that this will probably require us to also update the boot
configuration when updating images, but again this is a good workflow
improvement.  Users are likely to forget to reconfigure their nodes’ boot
images after updating them in Glance.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;This would not remove the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;openstack&lt;/span&gt; &lt;span class="pre"&gt;baremetal&lt;/span&gt; &lt;span class="pre"&gt;configure&lt;/span&gt; &lt;span class="pre"&gt;boot&lt;/span&gt;&lt;/code&gt;
command for independently updating the boot configuration of
Ironic nodes.  In essence, it would just always call the
configure boot command immediately after registering nodes so
it wouldn’t be a mandatory step.&lt;/p&gt;
&lt;p&gt;This also means that the deploy ramdisk would have to be built
and loaded into Glance before registering nodes, but our
documented process already satisfies that requirement, and we
could provide a –no-configure-boot param to import for cases
where someone wanted to register nodes without configuring them.&lt;/p&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flavor creation&lt;/strong&gt; Nowhere in our documentation do we recommend or
provide guidance on customizing the flavors that will be used for
deployment.  While it is possible to deploy solely based on flavor
hardware values (ram, disk, cpu), in practice it is often simpler
to just assign profiles to Ironic nodes and have scheduling done solely
on that basis.  This is also the method we document at this time.&lt;/p&gt;
&lt;p&gt;I propose that we simply create all of the recommended flavors at
undercloud install time and assign them the appropriate localboot and
profile properties at that time.  These flavors would be created with the
minimum supported cpu, ram, and disk values so they would work for any
valid hardware configuration.  This would also reduce the possibility of
typos in the flavor creation commands causing avoidable deployment
failures.&lt;/p&gt;
&lt;p&gt;These default flavors can always be customized if a user desires, so there
is no loss of functionality from making this change.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Node profile assignment&lt;/strong&gt; This is not currently part of the standard
workflow, but in practice it is something we need to be doing for most
real-world deployments with heterogeneous hardware for controllers,
computes, cephs, etc.  Right now the documentation requires running an
ironic node-update command specifying all of the necessary capabilities
(in the manual case anyway, this section does not apply to the AHC
workflow).&lt;/p&gt;
&lt;p&gt;os-cloud-config does have support for specifying the node profile in
the imported JSON file, but to my knowledge we don’t mention that anywhere
in the documentation.  This would be the lowest of low-hanging
fruit since it’s simply a question of documenting something we already
have.&lt;/p&gt;
&lt;p&gt;We could even give the generic baremetal flavor a profile and have our
default instackenv.json template include that[1], with a note that it can
be overridden to a more specific profile if desired.  If users want to
change a profile assignment after registration, the node update command
for ironic will still be available.&lt;/p&gt;
&lt;p&gt;1. For backwards compatibility, we might want to instead create a new flavor
named something like ‘default’ and use that, leaving the old baremetal
flavor as an unprofiled thing for users with existing unprofiled nodes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;section id="tripleo-sh"&gt;
&lt;h4&gt;tripleo.sh&lt;/h4&gt;
&lt;p&gt;tripleo.sh addresses the problem to some extent for developers, but it is
not a viable option for real world deployments (nor should it be IMHO).
However, it may be valuable to look at tripleo.sh for guidance on a simpler
flow that can be more easily followed, as that is largely the purpose of the
script.  A similar flow codified into the client/API would be a good result
of these proposed changes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="node-registration"&gt;
&lt;h4&gt;Node Registration&lt;/h4&gt;
&lt;p&gt;One option Dmitry has suggested is to make the node registration operation
idempotent, so that it can be re-run any number of times and will simply
update the details of any already registered nodes.  He also suggested
moving the bulk import functionality out of os-cloud-config and (hopefully)
into Ironic itself.&lt;/p&gt;
&lt;p&gt;I’m totally in favor of both these options, but I suspect that they will
represent a significantly larger amount of work than the other items in this
spec, so I think I’d like that to be addressed as an independent spec since
this one is already quite large.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Minimal, if any.  This is simply combining existing deployment steps.  If we
were to add a new API for node profile assignment that would have some slight
security impact as it would increase our attack surface, but I feel even that
would be negligible.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Simpler deployments.  This is all about the end user.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Some individual steps may take longer, but only because they will be
performing actions that were previously in separate steps.  In aggregate
the process should take about the same time.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;If all of these suggested improvements are implemented, it will make the
standard deployment process somewhat less flexible.  However, in the
Proposed Change section I attempted to address any such new limitations,
and I feel they are limited to the edgiest of edge cases that in most cases
can still be implemented through some extra manual steps (which likely would
have been necessary anyway - they are edge cases after all).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;There will be some changes in the basic workflow, but as noted above the same
basic steps will be getting run.  Developers will see some impact from the
proposed changes, but as they will still likely be using tripleo.sh for an
already simplified workflow it should be minimal.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;bnemec&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Configure boot on newly registered nodes automatically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reconfigure boot on nodes after deploy images are updated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove explicit step for configure boot from the docs, but leave the actual
function itself in the client so it can still be used when needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create flavors at undercloud install time and move documentation on creating
them manually to the advanced section of the docs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add a ‘default’ flavor to the undercloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the sample instackenv.json to include setting a profile (by default,
the ‘default’ profile associated with the flavor from the previous step).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;Nothing that I’m aware of.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;As these changes are implemented, we would need to update tripleo.sh to match
the new flow, which will result in the changes being covered in CI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;This should reduce the number of steps in the basic deployment flow in the
documentation.  It is intended to simplify the documentation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Proposed change to create flavors at undercloud install time:
&lt;a class="reference external" href="https://review.openstack.org/250059"&gt;https://review.openstack.org/250059&lt;/a&gt;
&lt;a class="reference external" href="https://review.openstack.org/251555"&gt;https://review.openstack.org/251555&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Wed, 21 Oct 2015 00:00:00 </pubDate></item><item><title>External Load Balancer</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/external-load-balancer.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-mitaka-external-load-balancer"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-mitaka-external-load-balancer&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Make it possible to use (optionally) an external load balancer as frontend for
the Overcloud.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;To use an external load balancer the Overcloud templates and manifests will be
updated to accomplish the following three:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;accept a list of virtual IPs as parameter to be used instead of the virtual
IPs which are normally created as Neutron ports and hosted by the controllers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;make the deployment and configuration of HAProxy on the controllers optional&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;allow for the assignment of a predefined list of IPs to the controller nodes
so that these can be used for the external load balancer configuration&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The VipMap structure, governed by the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Network::Ports::NetIpMap&lt;/span&gt;&lt;/code&gt;
resource type, will be switched to &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Network::Ports::NetVipMap&lt;/span&gt;&lt;/code&gt;,
a more specific resource type so that it can pointed to a custom YAML allowing
for the VIPs to be provided by the user at deployment time. Any reference to the
VIPs in the templates will be updated to gather the VIP details from such a
structure. The existing VIP resources will also be switched from the non
specialized type &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Controller::Ports::InternalApiPort&lt;/span&gt;&lt;/code&gt; into a
more specific type &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Network::Ports::InternalApiVipPort&lt;/span&gt;&lt;/code&gt; so that
it will be possible to noop the VIPs or add support for more parameters as
required and independently from the controller ports resource.&lt;/p&gt;
&lt;p&gt;The deployment and configuration of HAProxy on the controller nodes will become
optional and driven by a new template parameter visible only to the controllers.&lt;/p&gt;
&lt;p&gt;It will be possible to provide via template parameters a predefined list of IPs
to be assigned to the controller nodes, on each network, so that these can be
configured as target IPs in the external load balancer, before the deployment
of the Overcloud is initiated. A new port YAML will be provided for the purpose;
when using an external load balancer this will be used for resources like
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;OS::TripleO::Controller::Ports::InternalApiPort&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;As a requirement for the deployment process to succeed, the external load
balancer must be configured in advance with the appropriate balancing rules and
target IPs. This is because the deployment process itself uses a number of
infrastructure services (database/messaging) as well as core OpenStack services
(Keystone) during the configuration steps. A validation script will be provided
so that connectivity to the VIPs can be tested in advance and hopefully avoid
false negatives during the deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;By filtering the incoming connections for the controller nodes, an external load
blancer might help the Overcloud survive network flood attacks or issues due
to purposely malformed API requests.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The deployer wishing to deploy with an external load balancer will have to
provide at deployment time a few more parameters, amongst which:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;the VIPs configured on the balancer to be used by the Overcloud services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the IPs to be configured on the controllers, for each network&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Given there won’t be any instance of HAProxy running on the controllers, when
using an external load balancer these might benefit from a lower stress on the
TCP stack.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None expected unless deploying with an external load balancer. A sample
environment file will be provided to provide some guidance over the parameters
to be passed when deploying with an external load balancer.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;In those scenarios where the deployer was using only a subset of the isolated
networks, the customization templates will need to be updated so that the new
VIPs resource type is nooped. This can be achieved with something like:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;resource_registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;OS&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TripleO&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Network&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Ports&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;InternalApiVipPort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;network&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ports&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;noop&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;gfidente&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;dprince&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;accept user provided collection of VIPs as parameter&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;make the deployment of the managed HAProxy optional&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;allow for the assignment of a predefined list of IPs to the controller nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;add a validation script to test connectivity against the external VIPs&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The feature seems untestable in CI at the moment but it will be possible to test
at least the assignment of a predefined list of IPs to the controller nodes by
providing only the predefined list of IPs as parameter.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;In addition to documenting the specific template parameters needed when
deploying with an external load balancer, it will also be necessary to provide
some guidance for the configuration of the load balancer configuration so that
it will behave as expected in the event of a failure. Unfortunately the
configuration settings are strictly dependent on the balancer in use; we should
publish a copy of a managed HAProxy instance config to use as reference so that
a deployer could configure his external appliance similarily.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Mon, 12 Oct 2015 00:00:00 </pubDate></item><item><title>Library support for TripleO Overcloud Deployment</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/mitaka/tripleo-overcloud-deployment-library.html</link><description>
 
&lt;p&gt;We need a TripleO library that supports the overcloud deployment workflow.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;With Tuskar insufficient for complex overcloud deployments, TripleO has moved to
an overcloud deployment workflow that bypasses Tuskar.  This workflow can be
summarized as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The user edits the templates and environment file.  These can be stored
anywhere.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Templates may be validated by Heat.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Templates and environment are sent to Heat for overcloud deployment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Post-deploy, overcloud endpoints are configured.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;This workflow is already supported by the CLI.&lt;/p&gt;
&lt;p&gt;However from a GUI perspective, although the workflow is straightforward, it is
not simple.  Here are some of the complications that arise:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Some of the business logic in this workflow is contained in the CLI itself,
making it difficult for other UIs to use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the TripleO overcloud deployment workflow changes, it is easy for the CLI
and GUI approach to end up on divergent paths - a dangerous situation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The CLI approach allows open-ended flexibility (the CLI doesn’t care where the
templates come from) that is detrimental for a GUI (the GUI user doesn’t care
where the templates are stored, but consistency in approach is desirable to
prevent divergence among GUIs).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;There is a need to create common code that accommodates the flexibility of the
CLI with the ease-of-use needs of Python-based GUI consumers.  Note that an API
will eventually be needed in order to accommodate non-Python GUIs.  The work
there will be detailed in a separate spec.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;In order to solve this problem, we propose the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Encapsulate the business logic involved in the overcloud deployment workflow
within the tripleo-common library.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide a simplified workflow to hide unneeded complexity from GUI consumers
- for example, template storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the CLI to use this code where appropriate to prevent divergence with
GUIs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The first two points deserve further explanation.  First, let us lay out the
proposed GUI workflow.  We will refer to the Heat files the user desires to use
for the overcloud deployment as a ‘plan’.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;A user creates a plan by pushing a copy of the Heat deployment templates into
a data store.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The user defines values for the template resource types given by Heat
template capabilities.  This results in an updated resource registry in an
environment file saved to the data store.
(&lt;a class="reference external" href="https://review.openstack.org/#/c/196656/7/specs/liberty/resource-capabilities.rst"&gt;https://review.openstack.org/#/c/196656/7/specs/liberty/resource-capabilities.rst&lt;/a&gt;)
Note that this spec will be completed by mitaka at the earliest.  A
workaround is discussed below.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Now that the template resource types are specified, the user can configure
deployment parameters given by Heat.  Edited parameters are updated and an
updated environment file is saved to the data store.  ‘Roles’ no longer exist
in Tuskar, but can still be derived from available Heat parameters.
(&lt;a class="reference external" href="https://review.openstack.org/#/c/197199/5/specs/liberty/nested-validation.rst"&gt;https://review.openstack.org/#/c/197199/5/specs/liberty/nested-validation.rst&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Steps 2 and 3 can be repeated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With configuration complete, the user triggers the deployment of the
overcloud.  The templates and environment file are taken from the data store
and sent to Heat.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once overcloud deployment is complete, any needed post-deploy config is
performed.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In order to fulfill this workflow, we propose to initially promote the use of
Swift as the template data store.  This usage will be abstracted away behind
the tripleo-common library, and later updates may allow the use of other data
stores.&lt;/p&gt;
&lt;p&gt;Note that the Swift-workflow is intended to be an alternative to the current CLI
‘–templates’ workflow.  Both would end up being options under the CLI; a user
could choose ‘–templates’ or ‘–plan’.  However they would both be backed by
common tripleo-common library code, with the ‘–plan’ option simply calling
additional functions to pull the plan information from Swift.  And GUIs that
expect a Swift-backed deployment would lose functionality if the deployment
is deployed using the ‘–templates’ CLI workflow.&lt;/p&gt;
&lt;p&gt;The tripleo-common library functions needed are:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plan CRUD&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;create_plan(plan_name, plan_files)&lt;/strong&gt;: Creates a plan by creating a Swift
container matching plan_name, and placing all files needed for that plan
into that container (for Heat that would be the ‘parent’ templates, nested
stack templates, environment file, etc).  The Swift container will be
created with object versioning active to allow for versioned updates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;get_plan(plan_name)&lt;/strong&gt;: Retrieves the Heat templates and environment file
from the Swift container matching plan_name.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;update_plan(plan_name, plan_files)&lt;/strong&gt;: Updates a plan by updating the
plan files in the Swift container matching plan_name.  This may necessitate
an update to the environment file to add and/or remove parameters. Although
updates are versioned, retrieval of past versions will not be implemented
until the future.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;delete_plan(plan_name)&lt;/strong&gt;: Deletes a plan by deleting the Swift container
matching plan_name, but only if there is no deployed overcloud that was
deployed with the plan.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deployment Options&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;get_deployment_plan_resource_types(plan_name)&lt;/strong&gt;: Determine available
template resource types by retrieving plan_name’s templates from Swift and
using the proposed Heat resource-capabilities API
(&lt;a class="reference external" href="https://review.openstack.org/#/c/196656/7/specs/liberty/resource-capabilities.rst"&gt;https://review.openstack.org/#/c/196656/7/specs/liberty/resource-capabilities.rst&lt;/a&gt;).
If that API is not ready in the required timeframe, then we will implement
a temporary workaround - a manually created map between templates and
provider resources.  We would work closely with the spec developers to try
and ensure that the output of this method matches their proposed output, so
that once their API is ready, replacement is easy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;update_deployment_plan_resource_types(plan_name, resource_types)&lt;/strong&gt;:
Retrieve plan_name’s environment file from Swift and update the
resource_registry tree according to the values passed in by resource_types.
Then update the environment file in Swift.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deployment Configuration&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;get_deployment_parameters(plan_name)&lt;/strong&gt;: Determine available deployment
parameters by retrieving plan_name’s templates from Swift and using the
proposed Heat nested-validation API call
(&lt;a class="reference external" href="https://review.openstack.org/#/c/197199/5/specs/liberty/nested-validation.rst"&gt;https://review.openstack.org/#/c/197199/5/specs/liberty/nested-validation.rst&lt;/a&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;update_deployment_parameters(plan_name, deployment_parameters)&lt;/strong&gt;:
Retrieve plan_name’s environment file from Swift and update the parameters
according to the values passed in by deployment_parameters.  Then update the
environment file in Swift.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;get_deployment_roles(plan_name)&lt;/strong&gt;: Determine available deployment roles.
This can be done by retrieving plan_name’s deployment parameters and
deriving available roles from parameter names; or by looking at the top-
level ResourceGroup types.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;validate_plan(plan_name)&lt;/strong&gt;: Retrieve plan_name’s templates and environment
file from Swift and use them in a Heat API validation call.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;deploy_plan(plan_name)&lt;/strong&gt;: Retrieve plan_name’s templates and environment
file from Swift and use them in a Heat API call to create the overcloud
stack.  Perform any needed pre-processing of the templates, such as the
template file dictionary needed by Heat.  This function will return a Heat
stack ID that can be used to monitor the status of the deployment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Post-Deploy&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;postdeploy_plan(plan_name)&lt;/strong&gt;: Initialize the API endpoints of the
overcloud corresponding to plan_name.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The alternative is to force non-CLI UIs to re-implement the business logic
currently contained within the CLI.  This is not a good alternative.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The –templates workflow will end up being modified to use the updated
tripleo-common library.&lt;/p&gt;
&lt;p&gt;Python-based code would find it far easier to adapt the TripleO method of
deployment.  This may result in increased usage.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Right now, changing the overcloud deployment workflow results in stress due to
the need to individually update both the CLI and GUI code.  Converging the two
makes this a far easier proposition.  However developers will need to have this
architecture in mind and ensure that changes to the –templates or –plan
workflow are maintained in the tripleo-common library (when appropriate) to
avoid unneeded divergences.&lt;/p&gt;
&lt;p&gt;Another important item to note is that we will need to keep the TripleO CI
updated with changes, and will be responsible for fixing the CI as needed.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;Primary assignees:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;tzumainn&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;akrivoka&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;jtomasek&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;dmatthews&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;The work items required are:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Develop the tripleo-common library to provide the functionality described
above.  This also involves moving code from the CLI to tripleo-common.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the CLI to use the tripleo-common library.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;All patches that implement these changes must pass CI and add additional tests as
needed.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;We are dependent upon two HEAT specs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Heat resource-capabilities API
(&lt;a class="reference external" href="https://review.openstack.org/#/c/196656/7/specs/liberty/resource-capabilities.rst"&gt;https://review.openstack.org/#/c/196656/7/specs/liberty/resource-capabilities.rst&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Heat nested-validation API
(&lt;a class="reference external" href="https://review.openstack.org/#/c/197199/5/specs/liberty/nested-validation.rst"&gt;https://review.openstack.org/#/c/197199/5/specs/liberty/nested-validation.rst&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The TripleO CI should be updated to test the updated tripleo-common library.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The updated library with its Swift-backed workflow will have to be well-
documented and meet OpenStack standards.  Documentation will be needed in both
the tripleo-common and tripleo-docs repositories.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Tue, 29 Sep 2015 00:00:00 </pubDate></item><item><title>Release Branch proposal for TripleO</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/liberty/release-branch.html</link><description>
 
&lt;p&gt;To date, the majority of folks consuming TripleO have been doing so via the
master branches of the various repos required to allow TripleO to deploy
an OpenStack cloud.  This proposes an alternative “release branch” methodology
which should enable those consuming stable OpenStack releases to deploy
more easily using TripleO.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Historically strong guarantees about deploying the current stable OpenStack
release have not been made, and it’s not something we’ve been testing in
upstream CI.  This is fine from a developer perspective, but it’s a major
impediment to those wishing to deploy production clouds based on the stable
OpenStack releases/branches.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;I propose we consider supporting additional “release” branches, for selected
TripleO repos where release-specific changes are required.&lt;/p&gt;
&lt;p&gt;The model will be based on the stable branch model[1] used by many/most
OpenStack projects, but with one difference, “feature” backports will be
permitted provided they are 100% compatible with the currently released
OpenStack services.&lt;/p&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;The justification for allowing features is that many/most TripleO features are
actually enabling access to features of OpenStack services which will exist in
the stable branches of the services being deployed.  Thus, the target audience
of this branch will likely want to consume such “features” to better access
features and configurations which are appropriate to the OpenStack release they
are consuming.&lt;/p&gt;
&lt;p&gt;The other aspect of justification is that projects are adding features
constantly, thus it’s unlikely TripleO will be capable of aligning with every
possible new feature for, say Liberty, on day 1 of the release being made.  The
recognition that we’ll be playing “catch up”, and adopting a suitable branch
policy should mean there is scope to continue that alignment after the services
themselves have been released, which will be of benefit to our users.&lt;/p&gt;
&lt;p&gt;Changes landing on the master branch can be considered as valid candidates for
backport, unless:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The patch requires new features of an OpenStack service (that do not exist
on the stable branches) to operate. E.g if a tripleo-heat-templates change
needs new-for-liberty Heat features it would &lt;em&gt;not&lt;/em&gt; be allowed for release/kilo.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The patch enables Overcloud features of an OpenStack service that do not
exist on the stable branches of the supported Overcloud version (e.g for
release/kilo we only support kilo overcloud features).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;User visible interfaces are modified, renamed or removed - removal of
deprecated interfaces may be allowed on the master branch (after a suitable
deprecation period), but these changes would &lt;em&gt;not&lt;/em&gt; be valid for backport as
they could impact existing users without warning.  Adding new interfaces
such as provider resources or parameters would be permitted provided the
default behavior does not impact existing users of the release branch.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The patch introduces new dependencies or changes the current requirements.txt.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To make it easier to identify not-valid-for-backport changes, it’s proposed
that a review process be adopted whereby a developer proposing a patch to
master would tag a commit if it doesn’t meet the criteria above, or there is
some other reason why the patch would be unsuitable for backport.&lt;/p&gt;
&lt;p&gt;e.g:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;No-Backport: This patch requires new for Mitaka Heat features&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;The main alternative to this is to leave upstream TripleO as something which
primarily targets developer/trunk-chasing users, and leave maintaining a
stable branch of the various components to downstream consumers of TripleO,
rdo-manager for example.&lt;/p&gt;
&lt;p&gt;The disadvantage of this approach is it’s an impediment to adoption and
participation in the upstream project, so I feel it’d be better to do this work
upstream, and improve the experience for those wishing to deploy via TripleO
using only the upstream tools and releases.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;We’d need to ensure security related patches landing in master got
appropriately applied to the release branches (same as stable branches for all
other projects).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;This should make it much easier for end users to stand up a TripleO deployed
cloud using the stable released versions of OpenStack services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;This may reduce duplication of effort when multiple downstream consumers of
TripleO exist.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;The proposal of valid backports will ideally be made by the developer
proposing a patch to the master branch, but avoid creating an undue barrier to
entry for new contributors this will not be mandatory, but will be reccomended
and encouraged via code review comments.&lt;/p&gt;
&lt;p&gt;Standard stable-maint processes[1] will be observed when proposing backports.&lt;/p&gt;
&lt;p&gt;We need to consider if we want a separate stable-maint core (as is common on
most other projects), or if all tripleo-core members can approve backports.
Initially it is anticipated to allow all tripleo-core, potentially with the
addition of others with a specific interest in branch maintenance (e.g
downstream package maintainers).&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Initially the following repos will gain release branches:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;openstack/tripleo-common&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;openstack/tripleo-docs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;openstack/tripleo-heat-templates&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;openstack/tripleo-puppet-elements&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;openstack/python-tripleoclient&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;openstack/instack-undercloud&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These will all have a new branch created, ideally near the time of the upcoming
liberty release, and to avoid undue modification to existing infra tooling,
e.g zuul, they will use the standard stable branch naming, e.g:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;stable/liberty&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If any additional repos require stable branches, we can add those later when
required.&lt;/p&gt;
&lt;p&gt;It is expected that any repos which don’t have a stable/release branch must
maintain compatibility such that they don’t break deploying the stable released
OpenStack version (if this proves impractical in any case, we’ll create
branches when required).&lt;/p&gt;
&lt;p&gt;Also, when the release branches have been created, we will explicitly &lt;em&gt;not&lt;/em&gt;
require the master branch for those repos to observe backwards compatibility,
with respect to consuming new OpenStack features. For example, new-for-mitaka
Heat features may be consumed on the master branch of tripleo-heat-templates
after we have a stable/liberty branch for that repo.&lt;/p&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;shardy&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;TBC&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Identify the repos which require release branches&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create the branches&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Communicate need to backport to developers, consider options for automating&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CI jobs to ensure the release branch stays working&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation to show how users may consume the release branch&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We’ll need CI jobs configured to use the TripleO release branches, deploying
the stable branches of other OpenStack projects.  Hopefully we can make use of
e.g RDO packages for most of the project stable branch content, then build
delorean packages for the tripleo release branch content.&lt;/p&gt;
&lt;p&gt;Ideally in future we’d also test upgrade from one release branch to another
(e.g current release from the previous, and/or from the release branch to
master).&lt;/p&gt;
&lt;p&gt;As a starting point derekh has suggested we create a single centos job, which
only tests HA, and that we’ll avoid having a tripleo-ci release branch,
ideally using the under development[2] tripleo.sh developer script to abstract
any differences between deployment steps for branches.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We’ll need to update the docs to show:&lt;/p&gt;
&lt;p&gt;1. How to deploy an undercloud node from the release branches using stable
OpenStack service versions
2. How to build images containing content from the release branches
3. How to deploy an overcloud using only the release branch versions&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;We started discussing this idea in this thread:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2015-August/072217.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2015-August/072217.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[1] &lt;a class="reference external" href="https://wiki.openstack.org/wiki/StableBranch"&gt;https://wiki.openstack.org/wiki/StableBranch&lt;/a&gt;
[2] &lt;a class="reference external" href="https://review.openstack.org/#/c/225096/"&gt;https://review.openstack.org/#/c/225096/&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Wed, 09 Sep 2015 00:00:00 </pubDate></item><item><title>TripleO network configuration</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/network_configuration.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/os-net-config"&gt;https://blueprints.launchpad.net/tripleo/+spec/os-net-config&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We need a tool (or tools) to help configure host level networking
in TripleO. This includes things like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Static IPs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple OVS bridges&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bonding&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VLANs&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Today in TripleO we bootstrap nodes using DHCP so they can download
custom per node metadata from Heat. This metadata contains per instance
network information that allows us to create a customized host level network
configuration.&lt;/p&gt;
&lt;p&gt;Today this is accomplished via two scripts:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ensure-bridge: &lt;a class="reference external" href="http://git.openstack.org/cgit/openstack/tripleo-image-elements/tree/elements/network-utils/bin/ensure-bridge"&gt;http://git.openstack.org/cgit/openstack/tripleo-image-elements/tree/elements/network-utils/bin/ensure-bridge&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;init-neutron-ovs: &lt;a class="reference external" href="http://git.openstack.org/cgit/openstack/tripleo-image-elements/tree/elements/neutron-openvswitch-agent/bin/init-neutron-ovs"&gt;http://git.openstack.org/cgit/openstack/tripleo-image-elements/tree/elements/neutron-openvswitch-agent/bin/init-neutron-ovs&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;The problem with the existing scripts is that their feature set is extremely
prescriptive and limited. Today we only support bridging a single NIC
onto an OVS bridge, VLAN support is limited and more advanced configuration
(of even common IP address attributes like MTUs, etc) is not possible.&lt;/p&gt;
&lt;p&gt;Furthermore we also desire some level of control over how networking changes
are made and whether they are persistent. In this regard a provider layer
would be useful so that users can choose between using for example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ifcfg/eni scripts: used where persistence is required and we want
to configure interfaces using the distro supported defaults&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;iproute2: used to provide optimized/streamlined network configuration
which may or may not also include persistence&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Our capabilities are currently limited to the extent that we are unable
to fully provision our TripleO CI overclouds without making manual
changes and/or hacks to images themselves. As such we need to
expand our host level network capabilities.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;Create a new python project which encapsulates host level network configuration.&lt;/p&gt;
&lt;p&gt;This will likely consist of:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;an internal python library to facilitate host level network configuration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;a binary which processes a YAML (or JSON) format and makes the associated
python library calls to configure host level networking.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;By following this design the tool should work well with Heat driven
metadata and provide us the future option of moving some of the
library code into Oslo (oslo.network?) or perhaps Neutron itself.&lt;/p&gt;
&lt;p&gt;The tool will support a “provider” layer such that multiple implementations
can drive the host level network configuration (iproute2, ifcfg, eni).
This is important because as new network config formats are adopted
by distributions we may want to gradually start making use of them
(thinking ahead to systemd.network for example).&lt;/p&gt;
&lt;p&gt;The tool will also need to be extensible such that we can add new
configuration options over time. We may for example want to add
more advanced bondings options at a later point in time… and
this should be as easy as possible.&lt;/p&gt;
&lt;p&gt;The focus of the tool initially will be host level network configuration
for existing TripleO features (interfaces, bridges, vlans) in a much
more flexible manner. While we support these things today in a prescriptive
manner the new tool will immediately support multiple bridges, interfaces,
and vlans that can be created in an ad-hoc manner. Heat templates can be
created to drive common configurations and people can customize those
as needed for more advanced networking setups.&lt;/p&gt;
&lt;p&gt;The initial implementation will focus on persistent configuration formats
for ifcfg and eni, like we do today via ensure-bridge. This will help us
continue to make steps towards bringing bare metal machines back online
after a power outage (providing a static IP for the DHCP server for example).&lt;/p&gt;
&lt;p&gt;The primary focus of this tool should always be host level network
configuration and fine tuning that we can’t easily do within Neutron itself.
Over time the scope and concept of the tool may shift as Neutron features are
added and/or subtracted.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;One alternative is to keep expanding ensure-bridge and init-neutron-ovs
which would require a significant number of new bash options and arguments to
configure all the new features (vlans, bonds, etc.).&lt;/p&gt;
&lt;p&gt;Many of the deployment projects within the OpenStack ecosystem are doing
similar sorts of networking today. Consider:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Chef/Crowbar: &lt;a class="reference external" href="https://github.com/opencrowbar/core/blob/master/chef/cookbooks/network/recipes/default.rb"&gt;https://github.com/opencrowbar/core/blob/master/chef/cookbooks/network/recipes/default.rb&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fuel: &lt;a class="reference external" href="https://github.com/stackforge/fuel-library/tree/master/deployment/puppet/l23network"&gt;https://github.com/stackforge/fuel-library/tree/master/deployment/puppet/l23network&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VDSM (GPL): contains code to configure interfaces, both ifcfg and iproute2 abstractions (git clone &lt;a class="reference external" href="http://gerrit.ovirt.org/p/vdsm.git"&gt;http://gerrit.ovirt.org/p/vdsm.git&lt;/a&gt;, then look at vdsm/vdsm/network/configurators)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Netconf: heavy handed for this perhaps but interesting (OpenDaylight, etc)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Most of these options are undesirable because they would add a significant
number of dependencies to TripleO.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The configuration data used by this tool is already admin-oriented in
nature and will continue to be provided by Heat. As such there should
be no user facing security concerns with regards to access to the
configuration data that aren’t already present.&lt;/p&gt;
&lt;p&gt;This implementation will directly impact the low level network connectivity
in all layers of TripleO including the seed, undercloud, and overcloud
networks. Any of the host level networking that isn’t already provided
by Neutron is likely affected.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;This feature enables deployers to build out more advanced undercloud and
overcloud networks and as such should help improve the reliability and
performance of the fundamental host network capabilities in TripleO.&lt;/p&gt;
&lt;p&gt;End users should benefit from these efforts.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;This feature will allow us to build better/more advanced networks and as
such should help improve performance. In particular the interface bonding
and VLAN support should help in this regard.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Dan Prince (dan-prince on Launchpad)&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Create project on GitHub: os-net-config&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Import project into openstack-infra, get unit tests gating, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Build a python library to configure host level networking with
an initial focus on parity with what we already have including things
we absolutely need for our TripleO CI overcloud networks.&lt;/p&gt;
&lt;p&gt;The library will consist of an object model which will allow users to
create interfaces, bridges, and vlans, and bonds (optional). Each of
these types will act as a container for address objects (IPv4 and IPv6)
and routes (multiple routes may be defined). Additionally, each
object will include options to enable/disable DHCP and set the MTU.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create provider layers for ifcfg/eni. The providers take an object
model and apply it (“make it so”). The ifcfg provider will write out
persistent config files in /etc/sysconfig/network-scripts/ifcfg-&amp;lt;name&amp;gt;
and use ifup/ifdown to start and stop the interfaces when an change
has been made. The eni provider will write out configurations to
/etc/network/interfaces and likewise use ifup/ifdown to start and
stop interfaces when a change has been made.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a provider layer for iproute2. Optional, can be done at
a later time. This provider will most likely not use persistent
formats and will run various ip/vconfig/route commands to
configure host level networking for a given object model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a binary that processes a YAML config file format and makes
the correct python library calls. The binary should be idempotent
in that running the binary once with a given configuration should
“make it so”. Running it a second time with the same configuration
should do nothing (i.e. it is safe to run multiple times). An example
YAML configuration format is listed below which describes a single
OVS bridge with an attached interface, this would match what
ensure-bridge creates today:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;div class="highlight-yaml notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="nt"&gt;network_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ovs_bridge&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;br-ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;use_dhcp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;true&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ovs_extra&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;br-set-external-id br-ctlplane bridge-id br-ctlplane&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;members&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;interface&lt;/span&gt;&lt;span class="w"/&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;em1&lt;/span&gt;&lt;span class="w"/&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;The above format uses a nested approach to define an interface
attached to a bridge.&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;TripleO element to install os-net-config. Most likely using
pip (but we may use git initially until it is released).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wire this up to TripleO…get it all working together using the
existing Heat metadata formats. This would include any documentation
changes to tripleo-incubator, deprecating old elements, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TripleO heat template changes to use the new YAML/JSON formats. Our default
configuration would most likely do exactly what we do today (OVS bridge
with a single attached interface). We may want to create some other example
heat templates which can be used in other environments (multi-bridge
setups like we use for our CI overclouds for example).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Existing TripleO CI will help ensure that as we implement this we maintain
parity with the current feature set.&lt;/p&gt;
&lt;p&gt;The ability to provision and make use of our Triple CI clouds without
custom modifications/hacks will also be a proving ground for much of
the work here.&lt;/p&gt;
&lt;p&gt;Additional manual testing may be required for some of the more advanced
modes of operation (bonding, VLANs, etc.)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The recommended heat metadata used for network configuration may
change as result of this feature. Older formats will be preserved for
backwards compatibility.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Notes from the Atlanta summit session on this topic can be found
here (includes possible YAML config formats):&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-network-configuration"&gt;https://etherpad.openstack.org/p/tripleo-network-configuration&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
</description><pubDate>Sat, 21 Mar 2015 00:00:00 </pubDate></item><item><title>Cinder HA</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/kilo/cinder_ha.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-kilo-cinder-ha"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-kilo-cinder-ha&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Ensure Cinder volumes remain available if one or multiple nodes running
Cinder services or hosting volumes go down.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;TripleO currently deploys Cinder without a shared storage, balancing requests
amongst the nodes. Should one of the nodes running &lt;cite&gt;cinder-volume&lt;/cite&gt; fail,
requests for volumes hosted by that node will fail as well. In addition to that,
without a shared storage, should a disk of any of the &lt;cite&gt;cinder-volume&lt;/cite&gt; nodes
fail, volumes hosted by that node would be lost forever.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;We aim at introducing support for the configuration of Cinder’s Ceph backend
driver and for the deployment of a Ceph storage for use with Cinder.&lt;/p&gt;
&lt;p&gt;Such a scenario will install &lt;cite&gt;ceph-osd&lt;/cite&gt; on an arbitrary number of Ceph storage
nodes and &lt;cite&gt;cinder-api&lt;/cite&gt;, &lt;cite&gt;cinder-scheduler&lt;/cite&gt;, &lt;cite&gt;cinder-volume&lt;/cite&gt; and &lt;cite&gt;ceph-mon&lt;/cite&gt; on
the controller nodes, allowing users to scale out the Ceph storage nodes
independently from the controller nodes.&lt;/p&gt;
&lt;p&gt;To ensure HA of the volumes, these will be then hosted on the Ceph storage and
to achieve HA for the &lt;cite&gt;cinder-volume&lt;/cite&gt; service, all Cinder nodes will use a
shared string as their &lt;cite&gt;host&lt;/cite&gt; config setting so that will be able to operate
on the entire (and shared) set of volumes.&lt;/p&gt;
&lt;p&gt;Support for configuration of more drivers could be added later.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;An alternative approach could be to deploy the &lt;cite&gt;cinder-volume&lt;/cite&gt; services in an
active/standby configuration. This would allow us to support scenarios where the
storage is not shared amongst the Cinder nodes, one of which is for example
LVM over a shared Fiber Channel LUNs. Such a scenario would suffer from
downsides though, it won’t permit to scale out and balance traffic over the
storage nodes as easily and may be prone to issues related to the iSCSI session
management on failover.&lt;/p&gt;
&lt;p&gt;A different scenario, based instead on the usage of LVM and DRBD combined, could
be imagined too. Yet this would suffer from downsides as well. The deployment
program would be put in charge of managing the replicas and probably required to
have some understanding of the replicas status as well. These are easily covered
by Ceph itself which takes care of more related problems indeed, like data
rebalancing, or replicas recreation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;By introducing support for the deployment of the Ceph’s tools, we will have to
secure the Ceph services.&lt;/p&gt;
&lt;p&gt;We will allow access to the data hosted by Ceph only to authorized hosts via
usage of &lt;cite&gt;cephx&lt;/cite&gt; for authentication, distributing the &lt;cite&gt;cephx&lt;/cite&gt; keyrings on the
relevant nodes. Controller nodes will be provisioned with the &lt;cite&gt;ceph.mon&lt;/cite&gt;
keyring, with the &lt;cite&gt;client.admin&lt;/cite&gt; keyring and the &lt;cite&gt;client.cinder&lt;/cite&gt; keyring,
Compute nodes will be provisioned with the &lt;cite&gt;client.cinder&lt;/cite&gt; secret in libvirt and
lastly the Ceph storage nodes will be provisioned with the &lt;cite&gt;client.admin&lt;/cite&gt;
keyring.&lt;/p&gt;
&lt;p&gt;It is to be said that monitors should not be reachable from the public
network, despite being hosted on the Controllers. Also Cinder won’t need
to get access to the monitors’ keyring nor the &lt;cite&gt;client.admin&lt;/cite&gt; keyring but
those will be hosted on same host as Controllers also run the Ceph monitor
service; Cinder config will not provide any knowledge about those though.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Cinder volumes as well as Cinder services will remain available despite failure
of one (or more depending on scaling setting) of the Controller nodes or Ceph
storage nodes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;The &lt;cite&gt;cinder-api&lt;/cite&gt; services will remain balanced and the Controller nodes unloaded
of the LVM-file overhead and the iSCSI traffic so this topology should, as an
additional benefit, improve performances.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Automated setup of Cinder HA will require the deployment of Ceph.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To take advantage of a pre-existing Ceph installation instead of deploying it
via TripleO, deployers will have to provide the input data needed to configure
Cinder’s backend driver appropriately&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It will be possible to scale the number of Ceph storage nodes at any time, as
well as the number of Controllers (running &lt;cite&gt;cinder-volume&lt;/cite&gt;) but changing the
backend driver won’t be supported as there are no plans to support volumes
migration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Not all Cinder drivers support the scenario where multiple instances of the
&lt;cite&gt;cinder-volume&lt;/cite&gt; service use a shared &lt;cite&gt;host&lt;/cite&gt; string, notably the default LVM
driver does not. We will use this setting only when appropriate config params
are found in the Heat template, as it happens today with the param called
&lt;cite&gt;include_nfs_backend&lt;/cite&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ceph storage nodes, running the &lt;cite&gt;ceph-osd&lt;/cite&gt; service, use the network to
maintain replicas’ consistency and as such may transfer some large amount of
data over the network. Ceph allows for the OSD service to differentiate
between a public network and a cluster network for this purpose. This spec
is not going to introduce support for usage of a dedicated cluster network
but we want to have a follow-up spec to implement support for that later.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Cinder will continue to be configured with the LVM backend driver by default.&lt;/p&gt;
&lt;p&gt;Developers interested in testing Cinder with the Ceph shared storage will have
to use an appropriate scaling setting for the Ceph storage nodes.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;gfidente&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jprovazn&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;add support for deployment of Cinder’s Ceph backend driver&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;add support for deployment of the Ceph services&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;add support for external configuration of Cinder’s Ceph backend driver&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Will be testable in CI when support for the deployment of the shared Ceph
storage nodes becomes available in TripleO itself.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will need to provide documentation on how users can deploy Cinder together
with the Ceph storage nodes and also on how users can use instead some
pre-existing Ceph deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;juno mid-cycle meetup
kilo design session, &lt;a class="reference external" href="https://etherpad.openstack.org/p/tripleo-kilo-l3-and-cinder-ha"&gt;https://etherpad.openstack.org/p/tripleo-kilo-l3-and-cinder-ha&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Tue, 17 Mar 2015 00:00:00 </pubDate></item><item><title>Enable Neutron DVR on overcloud in TripleO</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/kilo/tripleo-enable-dvr.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/support-neutron-dvr"&gt;https://blueprints.launchpad.net/tripleo/+spec/support-neutron-dvr&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Neutron distributed virtual routing should be able to be configured in TripleO.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;To be able to enable distributed virtual routing in Neutron there needs to be
several changes to the current TripleO overcloud deployment.  The overcloud
compute node(s) are constructed with the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-openvswitch-agent&lt;/span&gt;&lt;/code&gt; image
element, which provides the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-openvswitch-agent&lt;/span&gt;&lt;/code&gt; on the compute node.
In order to support distributed virtual routing, the compute node(s) must also
have the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-metadata-agent&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-l3-agent&lt;/span&gt;&lt;/code&gt; installed. The
installation of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-l3-agent&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-dhcp-agent&lt;/span&gt;&lt;/code&gt; will need
also to be decoupled.&lt;/p&gt;
&lt;p&gt;Additionally, for distributed virtual routing to be enabled, the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron.conf&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;l3_agent.ini&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ml2_conf.ini&lt;/span&gt;&lt;/code&gt; all need to have
additional settings.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;In the tripleo-image-elements, move the current &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-network-node&lt;/span&gt;&lt;/code&gt; element
to an element named &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-router&lt;/span&gt;&lt;/code&gt;, which will be responsible for doing the
installation and configuration work required to install the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-l3-agent&lt;/span&gt;&lt;/code&gt;
and the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-metadata-agent&lt;/span&gt;&lt;/code&gt;. This &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-router&lt;/span&gt;&lt;/code&gt; element will list
the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-openvswitch-agent&lt;/span&gt;&lt;/code&gt; in its element-deps.  The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-network&lt;/span&gt;
&lt;span class="pre"&gt;-node&lt;/span&gt;&lt;/code&gt; element will then become simply a ‘wrapper’ whose sole purpose is to list
the dependencies required for a network node (neutron, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-dhcp-agent&lt;/span&gt;&lt;/code&gt;,
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-router&lt;/span&gt;&lt;/code&gt;, os-refresh-config).&lt;/p&gt;
&lt;p&gt;Additionally, in the tripleo-image-elements/neutron element, the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron.conf&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;l3_agent.ini&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;plugins/ml2/ml2_conf.ini&lt;/span&gt;&lt;/code&gt; will be
modified to add the configuration variables required in each to support
distributed virtual routing (the required configuration variables are listed at
&lt;a class="reference external" href="https://wiki.openstack.org/wiki/Neutron/DVR/HowTo#Configuration"&gt;https://wiki.openstack.org/wiki/Neutron/DVR/HowTo#Configuration&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;In the tripleo-heat-templates, the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nova-compute-config.yaml&lt;/span&gt;&lt;/code&gt;
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nova-compute-instance.yaml&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud-source.yaml&lt;/span&gt;&lt;/code&gt; files will be
modified to provide the correct settings for the new distributed virtual routing
variables.  The enablement of distributed virtual routing will be determined by
a ‘NeutronDVR’ variable which will be ‘False’ by default (distributed virtual
routing not enabled) for backward compatibility, but can be set to ‘True’ if
distributed virtual routing is desired.&lt;/p&gt;
&lt;p&gt;Lastly, the tripleo-incubator script &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;devtest_overcloud.sh&lt;/span&gt;&lt;/code&gt; will be modified
to: a) build the overcloud-compute disk-image with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-router&lt;/span&gt;&lt;/code&gt; rather
than with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-openvswitch-agent&lt;/span&gt;&lt;/code&gt;, and b) configure the appropriate
parameter values to be passed in to the heat stack create for the overcloud so
that distributed routing is either enabled or disabled.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We could choose to make no change to the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-router&lt;/span&gt;&lt;/code&gt; image-element and
it can be included as well in the list of elements arguments to the disk image
build for compute nodes.  This has the undesired effect of also
including/configuring and starting the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-dhcp-agent&lt;/span&gt;&lt;/code&gt; on each compute
node.  Alternatively, it is possible to keep the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-network-node&lt;/span&gt;&lt;/code&gt;
element as it is and create a &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-router&lt;/span&gt;&lt;/code&gt; element which is a copy of
most of the element contents of the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-network-node&lt;/span&gt;&lt;/code&gt; element but without
the dependency on the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-dhcp-agent&lt;/span&gt;&lt;/code&gt; element.  This approach would
introduce a significant amount of code duplication.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Although TripleO installation does not use FWaaS, enablement of DVR currently
is known to break FWaaS.
See &lt;a class="reference external" href="https://blueprints.launchpad.net/neutron/+spec/neutron-dvr-fwaas"&gt;https://blueprints.launchpad.net/neutron/+spec/neutron-dvr-fwaas&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The user will have the ability to set an environment variable during install
which will determine whether distributed virtual routing is enabled or not.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None identified&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;The option to enable or disable distributed virtual routing at install time will
be added.  By default distributed virtual routing will be disabled.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None identified&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Erik Colnick (erikcolnick on Launchpad)&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;None&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Create &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-router&lt;/span&gt;&lt;/code&gt; element in tripleo-image-elements and move related
contents from &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-network-node&lt;/span&gt;&lt;/code&gt; element.  Remove the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-dhcp-agent&lt;/span&gt;&lt;/code&gt; dependency from the element-deps of the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-router&lt;/span&gt;&lt;/code&gt; element.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-router&lt;/span&gt;&lt;/code&gt; element as a dependency in the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-network-node&lt;/span&gt;&lt;/code&gt; &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;element-deps&lt;/span&gt;&lt;/code&gt; file.  The &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;element-deps&lt;/span&gt;&lt;/code&gt;
file becomes the only content in the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron-network-node&lt;/span&gt;&lt;/code&gt; element.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the configuration values indicated in
&lt;a class="reference external" href="https://wiki.openstack.org/wiki/Neutron/DVR/HowTo#Configuration"&gt;https://wiki.openstack.org/wiki/Neutron/DVR/HowTo#Configuration&lt;/a&gt; to the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron.conf&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;l3_agent.ini&lt;/span&gt;&lt;/code&gt; and &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;ml2_conf.ini&lt;/span&gt;&lt;/code&gt; files in the
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;neutron&lt;/span&gt;&lt;/code&gt; image element.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the necessary reference variables to the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nova-compute-config.yaml&lt;/span&gt;&lt;/code&gt; and
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;nova-compute-instance.yaml&lt;/span&gt;&lt;/code&gt; tripleo-heat-templates files in order to be
able to set the new variables in the config files (from above item).  Add
definitions and default values in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;overcloud-source.yaml&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify tripleo-incubator &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;devtest_overcloud.sh&lt;/span&gt;&lt;/code&gt; script to set the
appropriate environment variables which will drive the configuration of
neutron on the overcloud to either enable distributed virtual routers or
disable distributed virtual routers (with disable as the default).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Existing TripleO CI will help ensure that as this is implemented, the current
feature set is not impacted and that the default behavior of disabled
distributed virtual routers is maintained.&lt;/p&gt;
&lt;p&gt;Additional CI tests which test the installation with distributed virtual
routers should be added as this implementation is completed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Documentation of the new configuration option will be needed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Mon, 10 Nov 2014 00:00:00 </pubDate></item><item><title>TripleO Review Standards</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/kilo/tripleo-review-standards.html</link><description>
 
&lt;p&gt;No launchpad blueprint because this isn’t a spec to be implemented in code.&lt;/p&gt;
&lt;p&gt;Like many OpenStack projects, TripleO generally has more changes incoming to
the projects than it has core reviewers to review and approve those changes.
Because of this, optimizing reviewer bandwidth is important.  This spec will
propose some changes to our review process discussed at the Paris OpenStack
Summit and intended to make the best possible use of core reviewer time.&lt;/p&gt;
&lt;p&gt;There are essentially two major areas that a reviewer looks at when reviewing
a given change: design and implementation.  The design part of the review
covers things like whether the change fits with the overall direction of the
project and whether new code is organized in a reasonable fashion.  The
implementation part of a review will get into smaller details, such as
whether language functionality is being used properly and whether the general
sections of the code identified in the design part of the review do what is
intended.&lt;/p&gt;
&lt;p&gt;Generally design is considered first, and then the reviewer will drill down to
the implementation details of the chosen design.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Many times an overall design for a given change will be agreed upon early in
the change’s lifecycle.  The implementation for the design may then be
tweaked multiple times (due to rebases, or specific issues pointed out by
reviewers) without any changes to the overall design.  Many times these
implementation details are small changes that shouldn’t require much
review effort, but because of our current standard of 2 +2’s on the current
patch set before a change can be approved, reviewers often must unnecessarily
revisit a change even when it is clear that everyone involved in the review
is in favor of it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;section id="overview"&gt;
&lt;h3&gt;Overview&lt;/h3&gt;
&lt;p&gt;When appropriate, allow a core reviewer to approve a change even if the
latest patch set does not have 2 +2’s.  Specifically, this should be used
under the following circumstances:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A change that has had multiple +2’s on past patch sets, indicating an
agreement from the other reviewers that the overall design of the change
is good.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Any further alterations to the change since the patch set(s) with +2’s should
be implementation details only - trivial rebases, minor syntax changes, or
comment/documentation changes.  Any more significant changes invalidate this
option.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As always, core reviewers should use their judgment.  When in doubt, waiting
for 2 +2’s to approve a change is always acceptable, but this new policy is
intended to make it socially acceptable to single approve a change under the
circumstances described above.&lt;/p&gt;
&lt;p&gt;When approving a change in this manner, it is preferable to leave a comment
explaining why the change is being approved without 2 +2’s.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Allowing a single +2 on “trivial” changes was also discussed, but there were
concerns from a number of people present that such a policy might cause more
trouble than it was worth, particularly since “trivial” changes by nature do
not require much review and therefore don’t take up much reviewer time.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Should be minimal to none.  If a change between patch sets is significant
enough to have a security impact then this policy does not apply.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Core reviewers will spend less time revisiting patches they have already
voted in favor of, and contributors should find it easier to get their
patches merged because they won’t have to wait as long after rebases and
minor changes.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bnemec&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;All cores should review and implement this spec in their reviewing&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;Publish the agreed-upon guidelines somewhere more permanent than a spec.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;A new document will need to be created for core reviewers to reference.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/kilo-tripleo-summit-reviews"&gt;https://etherpad.openstack.org/p/kilo-tripleo-summit-reviews&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Mon, 10 Nov 2014 00:00:00 </pubDate></item><item><title>Unit Testing TripleO Projects</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/unit-testing.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/unit-testing"&gt;https://blueprints.launchpad.net/tripleo/unit-testing&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We should enable more unit testing in TripleO projects to allow better test
coverage of code paths not included in CI, make it easier for reviewers
to verify that a code change does what it is supposed to, and avoid wasting
reviewer and developer time resolving style issues.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Right now there is very little unit testing of the code in most of the TripleO
projects.  This has a few negative effects:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We have no test coverage of any code that isn’t included in our CI runs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the code that is included in CI runs, we don’t actually know how much
of that code is being tested.  There may be many code branches that are not
used during a CI run.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We have no way to test code changes in isolation, which makes it slower to
iterate on them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Changes not covered by CI are either not tested at all or must be manually
tested by reviewers, which is tedious and error-prone.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Major refactorings frequently break less commonly used interfaces to tools
because those interfaces are not tested.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additionally, because there are few/no hacking-style checks in the TripleO
projects, many patches get -1’d for style issues that could be caught by
an automated tool.  This causes unnecessary delay in merging changes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;I would like to build out a unit testing framework that simplifies the
process of unit testing in TripleO.  Once that is done, we should start
requiring unit tests for new and changed features like the other OpenStack
projects do.  At that point we can also begin adding test coverage for
existing code.&lt;/p&gt;
&lt;p&gt;The current plan is to make use of Python unit testing libraries to be as
consistent as possible with the rest of OpenStack and make use of the test
infrastructure that already exists.  This will reduce the amount of new code
required and make it easier for developers to begin writing unit tests.&lt;/p&gt;
&lt;p&gt;For style checking, the dib-lint tool has already been created to catch
common errors in image elements.  More rules should be added to it as we
find problems that can be automatically found.  It should also be applied
to the tripleo-image-elements project.&lt;/p&gt;
&lt;p&gt;The bashate project also provides some general style checks that would be
useful in TripleO, so we should begin making use of it as well.  We should
also contribute additional checks when possible and provide feedback on any
checks we disagree with.&lt;/p&gt;
&lt;p&gt;Any unit tests added should be able to run in parallel.  This both speeds up
testing and helps find race bugs.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;section id="shell-unit-testing"&gt;
&lt;h4&gt;Shell unit testing&lt;/h4&gt;
&lt;p&gt;Because of the quantity of bash code used in TripleO, we may want to
investigate using a shell unit test framework in addition to Python.  I
think this can be revisited once we are further along in the process and
have a better understanding of how difficult it will be to unit test our
scripts with Python.  I still think we should start with Python for the
reasons above and only add other options if we find something that Python
unit tests can’t satisfy.&lt;/p&gt;
&lt;p&gt;One possible benefit of a shell-specific unit testing framework is that it
could provide test coverage stats so we know exactly what code is and isn’t
being tested.&lt;/p&gt;
&lt;p&gt;If we determine that a shell unit test framework is needed, we should try
to choose a widely-used one with well-understood workflows to ease adoption.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="sandboxing"&gt;
&lt;h4&gt;Sandboxing&lt;/h4&gt;
&lt;p&gt;I have done some initial experimentation with using fakeroot/fakechroot to
sandbox scripts that expect to have access to the root filesystem.  I was
able to run a script that writes to root-owned files as a regular user, making
it think it was writing to the real files, but I haven’t gotten this working
with tox for running unit tests that way.&lt;/p&gt;
&lt;p&gt;Another option would be to use real chroots.  This would provide isolation
and is probably more common than fakeroots.  The drawback would be that
chrooting requires root access on the host machine, so running the unit tests
would as well.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Many scripts in elements assume they will be running as root.  We obviously
don’t want to do that in unit tests, so we need a way to sandbox those scripts
to allow them to run but not affect the test system’s root filesystem.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Adding more tests will increase the amount of time Jenkins gate jobs take.
This should have minimal real impact though, because unit tests should run
in significantly less time than the integration tests.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers will need to implement unit tests for their code changes, which
will require learning the unit testing tools we adopt.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bnemec&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;goneri has begun some work to enable dib-lint in tripleo-image-elements&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Provide and document a good Python framework for testing the behavior of
bash scripts.  Use existing functionality in upstream projects where
possible, and contribute new features when necessary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gate tripleo-image-elements on dib-lint, which will require fixing any
lint failures currently in tripleo-image-elements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable bashate in the projects with a lot of bash scripts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add unit-testing to tripleo-incubator to enable verification of things
like &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;devtest.sh&lt;/span&gt; &lt;span class="pre"&gt;--build-only&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add a template validation test job to triple-heat-templates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;bashate will be a new test dependency.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;These changes should leverage the existing test infrastructure as much as
possible, so the only thing needed to enable the new tests would be changes
to the infra config for the affected projects.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;None of this work should be user-visible, but we may need developer
documentation to help with writing unit tests.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;bashate: &lt;a class="reference external" href="http://git.openstack.org/cgit/openstack-dev/bashate/"&gt;http://git.openstack.org/cgit/openstack-dev/bashate/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are some notes related to this spec at the bottom of the Summit
etherpad: &lt;a class="reference external" href="https://etherpad.openstack.org/p/juno-summit-tripleo-ci"&gt;https://etherpad.openstack.org/p/juno-summit-tripleo-ci&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Wed, 13 Aug 2014 00:00:00 </pubDate></item><item><title>Dracut Deploy Ramdisks</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/tripleo-juno-dracut-ramdisks.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-dracut-ramdisks"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-dracut-ramdisks&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Our current deploy ramdisks include functionality that is duplicated from
existing tools such as Dracut, and do not include some features that those
tools do.  Reimplementing our deploy ramdisks to use Dracut would shrink
our maintenance burden for that code and allow us to take advantage of those
additional features.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Currently our deploy ramdisks are implemented as a bash script that runs
as init during the deploy process.  This means that we are responsible for
correctly configuring things such as udev and networking which would normally
be handled by distribution tools.  While this isn’t an immediate problem
because the implementation has already been done, it is an unnecessary
duplication and additional maintenance debt for the future as we need to add
or change such low-level functionality.&lt;/p&gt;
&lt;p&gt;In addition, because our ramdisk is a one-off, users will not be able to make
use of any ramdisk troubleshooting methods that they might currently know.
This is an unnecessary burden when there are tools to build ramdisks that are
standardized and well-understood by the people using our software.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;The issues discussed above can be dealt with by using a standard tool such as
Dracut to build our deploy ramdisks.  This will actually result in a reduction
in code that we have to maintain and should be compatible with all of our
current ramdisks because we can continue to use the same method of building
the init script - it will just run as a user script instead of process 0,
allowing Dracut to do low-level configuration for us.&lt;/p&gt;
&lt;p&gt;Initially this will be implemented alongside the existing ramdisk element to
provide a fallback option if there are any use cases not covered by the
initial version of the Dracut ramdisk.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;For consistency with the rest of Red Hat/Fedora’s ramdisks I would prefer to
implement this using Dracut, but if there is a desire to also make use of
another method of building ramdisks, that could probably be implemented
alongside Dracut.  The current purely script-based implementation could even
be kept in parallel with a Dracut version.  However, I believe Dracut is
available on all of our supported platforms so I don’t see an immediate need
for alternatives.&lt;/p&gt;
&lt;p&gt;Additionally, there is the option to replace our dynamically built init
script with Dracut modules for each deploy element.  This is probably
unnecessary as it is perfectly fine to use the current method with Dracut,
and using modules would tightly couple our deploy ramdisks to Dracut, making
it difficult to use any alternatives in the future.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;The same security considerations that apply to the current deploy ramdisk
would continue to apply to Dracut-built ones.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;This change would enable end users to make use of any Dracut knowledge they
might already have, including the ability to dynamically enable tracing
of the commands used to do the deployment (essentially set -x in bash).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Because Dracut supports more hardware and software configurations, it is
possible there will be some additional overhead during the boot process.
However, I would expect this to be negligible in comparison to the time it
takes to copy the image to the target system, so I see it as a reasonable
tradeoff.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;As noted before, Dracut supports a wide range of hardware configurations,
so deployment methods that currently wouldn’t work with our script-based
ramdisk would become available.  For example, Dracut supports using network
disks as the root partition, so running a diskless node with separate
storage should be possible.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;There would be some small changes to how developers would add a new dependency
to the ramdisk images.  Instead of executables and their required libraries
being copied to the ramdisk manually, the executable can simply be added to
the list of things Dracut will include in the ramdisk.&lt;/p&gt;
&lt;p&gt;Developers would also gain the dynamic tracing ability mentioned above in
the end user impact.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bnemec&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Convert the ramdisk element to use Dracut (see WIP change in References).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify that DHCP booting of ramdisks still works.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify that nova-baremetal ramdisks can be built successfully with Dracut.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify that Ironic ramdisks can be built successfully with Dracut.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify that Dracut can build Ironic-IPA ramdisks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify the Dracut debug shell provides equivalent functionality to the
existing one.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide ability for other elements to install additional files to the
ramdisk.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide ability for other elements to include additional drivers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Find a way to address potential 32-bit binaries being downloaded and run in
the ramdisk for firmware deployments.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;This would add a dependency on Dracut for building ramdisks.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Since building deploy ramdisks is already part of CI, this should be covered
automatically.  If it is implemented in parallel with another method, then
the CI jobs would need to be configured to exercise the different methods
available.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We would want to document the additional features available in Dracut.
Otherwise this should function in essentially the same way as the current
ramdisks, so any existing documentation will still be valid.&lt;/p&gt;
&lt;p&gt;Some minor developer documentation changes may be needed to address the
different ways Dracut handles adding extra kernel modules and files.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Dracut: &lt;a class="reference external" href="https://dracut.wiki.kernel.org/index.php/Main_Page"&gt;https://dracut.wiki.kernel.org/index.php/Main_Page&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PoC of building ramdisks with Dracut:
&lt;a class="reference external" href="https://review.openstack.org/#/c/105275/"&gt;https://review.openstack.org/#/c/105275/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;openstack-dev discussion:
&lt;a class="reference external" href="http://lists.openstack.org/pipermail/openstack-dev/2014-July/039356.html"&gt;http://lists.openstack.org/pipermail/openstack-dev/2014-July/039356.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Sun, 20 Jul 2014 00:00:00 </pubDate></item><item><title>Haproxy ports and related services configuration</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/haproxy_configuration.html</link><description>
 
&lt;p&gt;Blueprint: &lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-haproxy-configuration"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-haproxy-configuration&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Current spec provides options for HA endpoints delivery via haproxy.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Current tripleo deployment scheme binds services on 0.0.0.0:standard_port,
with stunnel configured to listen on ssl ports.&lt;/p&gt;
&lt;p&gt;This configuration has some drawbacks and wont work in ha, for several reasons:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;haproxy cant bind on &amp;lt;vip_address&amp;gt;:&amp;lt;service_port&amp;gt; - openstack services are
bound to 0.0.0.0:&amp;lt;service_port&amp;gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;services ports hardcoded in many places (any_service.conf, init-keystone),
so changing them and configuring from heat would be a lot of pain&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the non-ssl endpoint is reachable from outside the local host,
which could potentially confuse users and expose them to an insecure connection
in the case where we want to run that service on SSL only. We want to offer SSL
by default but we can’t really prevent it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;We will bind haproxy, stunnel (ssl), openstack services on ports with
different ipaddress settings.&lt;/p&gt;
&lt;p&gt;HAProxy will be bound to VIP addresses only.&lt;/p&gt;
&lt;p&gt;STunnel where it is used will be bound to the controller ctlplane address.&lt;/p&gt;
&lt;p&gt;OpenStack services will bind to localhost for SSL only configurations, and to
the ctlplane address for non-SSL or mixed-mode configurations. They will bind
to the standard non-encrypted ports, but will never bind to 0.0.0.0 on any
port.&lt;/p&gt;
&lt;p&gt;We’ll strive to make SSL-only the default.&lt;/p&gt;
&lt;p&gt;An example, using horizon in mixed mode (HTTPS and HTTP):&lt;/p&gt;
&lt;p&gt;vip_address = 192.0.2.21
node_address = 192.0.2.24&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;haproxy
listen horizon_http
bind vip_address:80
server node_1 node_address:80
listen horizon_https
bind vip_address:443
server node_1 node_address:443&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;stunnel
accept node_address:443
connect node_address:80&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;horizon
bind node_address:80&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A second example, using horizon in HTTPS only mode:&lt;/p&gt;
&lt;p&gt;vip_address = 192.0.2.21
node_address = 192.0.2.24&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;haproxy
listen horizon_https
bind vip_address:443
server node_1 node_address:443&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;stunnel
accept node_address:443
connect 127.0.0.1:80&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;horizon
bind 127.0.0.1:80&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;There are several alternatives which do not cover all the requirements for
security or extensibility&lt;/p&gt;
&lt;p&gt;Option 1: Assignment of different ports for haproxy, stunnel, openstack services on 0.0.0.0&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;requires additional firewall configuration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;security issue with non-ssl services endpoints&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;haproxy
bind :80&lt;/p&gt;
&lt;p&gt;listen horizon
server node_1 node_address:8800&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;stunnel
accept :8800
connect :8880&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;horizon
bind :8880&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Option 2: Using only haproxy ssl termination is suboptimal:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;1.5 is still in devel phase -&amp;gt; potential stability issues&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;we would have to get this into supported distros&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;this also means that there is no SSL between haproxy and real service&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;security issue with non-ssl services endpoints&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;haproxy
bind vip_address:80&lt;/p&gt;
&lt;p&gt;listen horizon
server node_1 node_address:80&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;horizon
bind node_address:80&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Option 3: Add additional ssl termination before load-balancer&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;not useful in current configuration because load balancer (haproxy)
and openstack services installed on same nodes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Only ssl protected endpoints are publicly available if running SSL only.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Minimal firewall configuration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Not forwarding decrypted traffic over non-localhost connections&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;compromise of a control node exposes all external traffic (future and possibly past)
to decryption and/or spoofing&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Several services will listen on same port, but it will be quite easy
to understand if user (operator) will know some context.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;No differences between approaches.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;We need to make the service configs - nova etc - know on a per service basis
where to bind. The current approach uses logic in the template to choose
between localhost and my_ip. If we move the selection into Heat this can
become a lot simpler (read a bind address, if set use it, if not don’t).&lt;/p&gt;
&lt;p&gt;We considered extending the connect_ip concept to be on a per service basis.
Right now all services are exposed to both SSL and plain, so this would be
workable until we get a situation where only some services are plain - but we
expect that sooner rather than later.&lt;/p&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;dshulyak&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;p&gt;tripleo-incubator:
* build overcloud-control image with haproxy element&lt;/p&gt;
&lt;p&gt;tripleo-image-elements:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;openstack-ssl element refactoring&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;refactor services configs to listen on 127.0.0.1 / ctlplane address:
horizon apache configuration, glance, nova, cinder, swift, ceilometer,
neutron, heat, keystone, trove&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;tripleo-heat-templates:
* add haproxy metadata to heat-templates&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;CI testing dependencies:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;use vip endpoints in overcloud scripts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;add haproxy element to overcloud-control image (maybe with stats enabled) before
adding haproxy related metadata to heat templates&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;update incubator manual&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;update elements README.md&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://haproxy.1wt.eu/download/1.4/doc/configuration.txt"&gt;http://haproxy.1wt.eu/download/1.4/doc/configuration.txt&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://www.stunnel.org/howto.html"&gt;https://www.stunnel.org/howto.html&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Fri, 18 Jul 2014 00:00:00 </pubDate></item><item><title>TripleO Template and Deployment Plan Storage</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/tripleo-juno-tuskar-template-storage.html</link><description>
 
&lt;p&gt;This design specification describes a storage solution for a deployment plan.
Deployment plans consist of a set of roles, which in turn define a master Heat
template that can be used by Heat to create a stack representing the deployment
plan; and an environment file that defines the parameters needed by the master
template.&lt;/p&gt;
&lt;p&gt;This specification is principally intended to be used by Tuskar.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tuskar/+spec/tripleo-juno-tuskar-template-storage"&gt;https://blueprints.launchpad.net/tuskar/+spec/tripleo-juno-tuskar-template-storage&lt;/a&gt;&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;span id="tripleo-juno-tuskar-template-storage-problem"/&gt;&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;div class="admonition note"&gt;
&lt;p class="admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;The terminology used in this specification is defined in the &lt;a class="reference external" href="https://blueprints.launchpad.net/tuskar/+spec/tripleo-juno-tuskar-plan-rest-api"&gt;Tuskar
REST API&lt;/a&gt; specification.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In order to accomplish the goal of this specification, we need to first define
storage domain models for roles, deployment plans, and associated concepts.
These associated concepts include Heat templates and environment files.  The
models must account for requirements such as versioning and the appropriate
relationships between objects.&lt;/p&gt;
&lt;p&gt;We also need to create a storage mechanism for these models.  The storage
mechanism should be distinct from the domain model, allowing the latter to be
stable while the former retains enough flexibility to use a variety of backends
as need and availability dictates.  Storage requirements for particular models
include items such as versioning and secure storage.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Change Summary&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The following proposed change is split into three sections:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Storage Domain Models: Defines the domain models for templates, environment
files, roles, and deployment plans.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage API Interface: Defines Python APIs that relate the models to
the underlying storage drivers; is responsible for translating stored content
into a model object and vice versa.  Each model requires its own storage
interface.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage Drivers: Defines the API that storage backends need to implement in
order to be usable by the Python API Interface.  Plans for initial and future
driver support are discussed here.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It should be noted that each storage interface will be specified by the user as
part of the Tuskar setup.  Thus, the domain model can assume that the appropriate
storage interfaces - a template store, an environment store, etc - are defined
globally and accessible for use.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Storage Domain Models&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The storage API requires the following domain models:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Template&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Environment File&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Role&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deployment Plan&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first two map directly to Heat concepts; the latter two are Tuskar concepts.&lt;/p&gt;
&lt;p&gt;Note that each model will also contain a save method. The save method will call
create on the store if the uuid isn’t set, and will call update on the store
if the instance has a uuid.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Template Model&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The template model represents a Heat template.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;integer&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

    &lt;span class="c1"&gt;# This is derived from the content from within the template store.&lt;/span&gt;
    &lt;span class="n"&gt;parameters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;parameter&lt;/span&gt; &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;their&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;defaults&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Environment File Model&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The environment file defines the parameters and resource registry for a Heat
stack.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EnvironmentFile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

    &lt;span class="c1"&gt;# These are derived from the content from within the environment file store.&lt;/span&gt;
    &lt;span class="n"&gt;resource_registry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="n"&gt;names&lt;/span&gt;
    &lt;span class="n"&gt;parameters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;parameter&lt;/span&gt; &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;their&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_provider_resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Adds the specified template object to the environment file as a&lt;/span&gt;
        &lt;span class="c1"&gt;# provider resource.  This updates the parameters and resource registry&lt;/span&gt;
        &lt;span class="c1"&gt;# in the content.  The provider resource type will be derived from the&lt;/span&gt;
        &lt;span class="c1"&gt;# template file name.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;remove_provider_resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Removes the provider resource that matches the template from the&lt;/span&gt;
        &lt;span class="c1"&gt;# environment file.  This updates the parameters and resource registry&lt;/span&gt;
        &lt;span class="c1"&gt;# in the content.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_parameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params_dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# The key/value pairs in params_dict correspond to parameter names/&lt;/span&gt;
        &lt;span class="c1"&gt;# desired values.  This method updates the parameters section in the&lt;/span&gt;
        &lt;span class="c1"&gt;# content to the values specified in params_dict.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Role Model&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A role is a scalable unit of a cloud.  A deployment plan specifies one or more
roles.  Each role must specify a primary role template.  It must also specify
the dependencies of that template.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;integer&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;role_template_uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;dependent_template_uuids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_role_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the Template with uuid matching role_template_uuid&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_dependent_templates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the list of Templates with uuids matching&lt;/span&gt;
        &lt;span class="c1"&gt;# dependent_template_uuids&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Deployment Plan Model&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The deployment plan defines the application to be deployed.  It does so by
specifying a list of roles.  Those roles are used to construct an environment
file that contains the parameters that are needed by the roles’ templates and
the resource registry that register each role’s primary template as a provider
resource.  A master template is also constructed so that the plan can be
deployed as a single Heat stack.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DeploymentPlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;role_uuids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;Role&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;
    &lt;span class="n"&gt;master_template_uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;environment_file_uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EnvironmentFile&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_roles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the list of Roles with uuids matching role_uuids&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_master_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the Template with uuid matching master_template_uuid&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_environment_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the EnvironmentFile with uuid matching environment_file_uuid&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Adds a Role to the plan.  This operation will modify the master&lt;/span&gt;
        &lt;span class="c1"&gt;# template and environment file through template munging operations&lt;/span&gt;
        &lt;span class="c1"&gt;# specified in a separate spec.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;remove_role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Removes a Role from the plan.  This operation will modify the master&lt;/span&gt;
        &lt;span class="c1"&gt;# template and environment file through template munging operations&lt;/span&gt;
        &lt;span class="c1"&gt;# specified in a separate spec.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_dependent_templates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Returns a list of dependent templates.  This consists of the&lt;/span&gt;
        &lt;span class="c1"&gt;# associated role templates.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Storage API Interface&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Each of the models defined above has their own Python storage interface. These
are manager classes that query and perform CRUD operations against the storage
drivers and return instances of the models for use (with the exception of delete
which returns &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;None&lt;/span&gt;&lt;/code&gt;). The storage interfaces bind the models to the driver
being used; this allows us to store each model in a different location.&lt;/p&gt;
&lt;p&gt;Note that each store also contains a serialize method and a deserialize method.
The serialize method takes the relevant object and returns a dictionary
containing all value attributes; the deserialize method does the reverse.&lt;/p&gt;
&lt;p&gt;The drivers are discussed in
&lt;a class="reference internal" href="#tripleo-juno-tuskar-template-storage-drivers"&gt;&lt;span class="std std-ref"&gt;the next section&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Template API&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TemplateStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Creates a Template.  If no template exists with a matching name,&lt;/span&gt;
        &lt;span class="c1"&gt;# the template version is set to 0; otherwise it is set to the&lt;/span&gt;
        &lt;span class="c1"&gt;# greatest existing version plus one.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the Template with the specified uuid.  Queries a Heat&lt;/span&gt;
        &lt;span class="c1"&gt;# template parser for template parameters and dependent template names.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_by_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the Template with the specified name and version.  If no&lt;/span&gt;
        &lt;span class="c1"&gt;# version is specified, retrieves the latest version of the Template.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Deletes the Template with the specified uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;only_latest&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Returns a list of all Templates.  If only_latest is True, filters&lt;/span&gt;
        &lt;span class="c1"&gt;# the list to the latest version of each Template name.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Environment File API&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The environment file requires secure storage to protect parameter values.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EnvironmentFileStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Creates an empty EnvironmentFile.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the EnvironmentFile with the specified uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Updates an EnvironmentFile.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Deletes the EnvironmentFile with the specified uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Returns a list of all EnvironmentFiles.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Role API&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RoleStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role_template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
               &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;template_uuid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Creates a Role.  If no role exists with a matching name, the&lt;/span&gt;
        &lt;span class="c1"&gt;# template version is set to 0; otherwise it is set to the greatest&lt;/span&gt;
        &lt;span class="c1"&gt;# existing version plus one.&lt;/span&gt;
        &lt;span class="c1"&gt;#&lt;/span&gt;
        &lt;span class="c1"&gt;# Dependent templates are derived from the role_template.  The&lt;/span&gt;
        &lt;span class="c1"&gt;# create method will take all dependent template names from&lt;/span&gt;
        &lt;span class="c1"&gt;# role_template, retrieve the latest version of each from the&lt;/span&gt;
        &lt;span class="c1"&gt;# TemplateStore, and use those as the dependent template list.&lt;/span&gt;
        &lt;span class="c1"&gt;#&lt;/span&gt;
        &lt;span class="c1"&gt;# If a dependent template is missing from the TemplateStore, then&lt;/span&gt;
        &lt;span class="c1"&gt;# an exception is raised.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the Role with the specified uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_by_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the Role with the specified name and version.  If no&lt;/span&gt;
        &lt;span class="c1"&gt;# version is specified, retrieves the latest version of the Role.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Updates a Role.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Deletes the Role with the specified uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;only_latest&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Returns a list of all Roles.  If only_latest is True, filters&lt;/span&gt;
        &lt;span class="c1"&gt;# the list to the latest version of each Role.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Deployment Plan API&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DeploymentPlanStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Creates a DeploymentPlan.  Also creates an associated empty master&lt;/span&gt;
        &lt;span class="c1"&gt;# Template and EnvironmentFile; these will be modified as Roles are&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves the DeploymentPlan with the specified uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Updates a DeploymentPlan.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Deletes the DeploymentPlan with the specified uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieves a list of all DeploymentPlans.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p id="tripleo-juno-tuskar-template-storage-drivers"&gt;&lt;strong&gt;Storage Drivers&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Storage drivers operate by storing object dictionaries.  For storage solutions
such as Glance these dictionaries are stored as flat files.  For a storage
solution such as a database, the dictionary is translated into a table row.  It
is the responsibility of the driver to understand how it is storing the object
dictionaries.&lt;/p&gt;
&lt;p&gt;Each storage driver must provide the following methods.&lt;/p&gt;
&lt;div class="highlight-python notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;object_dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Stores the specified content under filename and returns the resulting&lt;/span&gt;
        &lt;span class="c1"&gt;# uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Returns the object_dict matching the uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;object_dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Updates the object_dict specified by the uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Deletes the content specified by the uuid.&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Return a list of all content.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For Juno, we will aim to use a combination of a relational database and Heat.
Heat will be used for the secure storage of sensitive environment parameters.
Database tables will be used for everything else. The usage of Heat for secure
stores relies on &lt;a class="reference external" href="https://bugs.launchpad.net/heat/+bug/1224828"&gt;PATCH support&lt;/a&gt; to be added the Heat API. This bug is
targeted for completion by Juno-2.&lt;/p&gt;
&lt;p&gt;This is merely a short-term solution, as it is understood that there is some
reluctance in introducing an unneeded database dependency.  In the long-term we
would like to replace the database with Glance once it is updated from an image
store to a more general artifact repository.  However, this feature is currently
in development and cannot be relied on for use in the Juno cycle.  The
architecture described in this specification should allow reasonable ease in
switching from one to the other.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;span id="tripleo-juno-tuskar-template-storage-alternatives"/&gt;&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Modeling Relationships within Heat Templates&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The specification proposes modeling relationships such as a plan’s associated
roles or a role’s dependent templates as direct attributes of the object.
However, this information would appear to be available as part of a plan’s
environment file or by traversing the role template’s dependency graph.  Why
not simply derive the relationships in that way?&lt;/p&gt;
&lt;p&gt;A role is a Tuskar abstraction.  Within Heat, it corresponds to a template used
as a provider resource; however, a role has added requirements, such as the
versioning of itself and its dependent templates, or the ability to list out
available roles for selection within a plan.  These are not requirements that
Heat intends to fulfill, and fulfilling them entirely within Heat feels like an
abuse of mechanics.&lt;/p&gt;
&lt;p&gt;From a practical point of view, modeling relationships within Heat templates
requires the in-place modification of Heat templates by Tuskar to deal with
versioning.  For example, if version 1 of the compute role specifies
{{compute.yaml: 1}, {compute-config.yaml: 1}}, and version 2 of the role
specifies {{compute.yaml: 1}, {compute-config.yaml: 2}}, the only way to
allow both versions of the role to be used is to allow programmatic
modification of compute.yaml to point at the correct version of
compute-config.yaml.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Swift as a Storage Backend&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Swift was considered as an option to replace the relational database but was
ultimately discounted for two key reasons:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The versioning system in Swift doesn’t provide a static reference to the
current version of an object. Rather it has the version “latest” and this is
dynamic and changes when a new version is added, therefore there is no way to
stick a deployment to a version.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We need to create a relationship between the provider resources within a Role
and swift doesn’t support relationships between stored objects.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Having said that, after seeking guidance from the Swift team, it has been
suggested that a naming convention or work with different containers may
provide us with enough control to mimic a versioning system that meets our
requirements. These suggestions have made Swift more favourable as an option.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;File System as a Storage Backend&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The filesystem was briefly considered and may be included to provide a simpler
developer setup. However, to create a production ready system with versioning,
and relationships this would require re-implementing much of what other
databases and services provide for us. Therefore, this option is reserved only
for a development option which will be missing key features.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Secure Driver Alternatives&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Barbican, the OpenStack secure storage service, provides us with an alternative
if PATCH support isn’t added to Heat in time.&lt;/p&gt;
&lt;p&gt;Currently the only alternative other than Barbican is to implement our own
cryptography with one of the other options listed above. This isn’t a
favourable choice as it adds a technical complexity and risk that should be
beyond the scope of this proposal.&lt;/p&gt;
&lt;p&gt;The other option with regards to sensitive data is to not store any. This would
require the REST API caller to provide the sensitive information each time a
Heat create (and potentially update) is called.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Some of the configuration values, such as service passwords, will be sensitive.
For this reason, Heat or Barbican will be used to store all configuration
values.&lt;/p&gt;
&lt;p&gt;While access will be controlled by the Tuskar API large files could be provided
in the place of provider resource files or configuration files. These should be
verified against a reasonable limit.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;The template storage will be primarily used by the Tuskar API, but as it may be
used directly in the future it will need to be documented.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Storing the templates in Glance and Barbican will lead to API calls over the
local network rather than direct database access. These are likely to have
higher overhead. However, the read and writing used in Tuskar is expected to be
infrequent and will only trigger simple reads and writes when manipulating a
deployment plan.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;TripleO will have access to sensitive and insensitive storage through the
storage API.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;d0ugal&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;tzumainn&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Implement storage API&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create Glance and Barbican based storage driver&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create database storage driver&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Glance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Barbican&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The API logic will be verified with a suite of unit tests that mock the
external services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tempest will be used for integration testing.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The code should be documented with docstrings and comments. If it is used
outside of Tuskar further user documentation should be developed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/glance/+spec/artifact-repository-api"&gt;https://blueprints.launchpad.net/glance/+spec/artifact-repository-api&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/glance/+spec/metadata-artifact-repository"&gt;https://blueprints.launchpad.net/glance/+spec/metadata-artifact-repository&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://bugs.launchpad.net/heat/+bug/1224828"&gt;https://bugs.launchpad.net/heat/+bug/1224828&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://docs.google.com/document/d/1tOTsIytVWtXGUaT2Ia4V5PWq4CiTfZPDn6rpRm5In7U"&gt;https://docs.google.com/document/d/1tOTsIytVWtXGUaT2Ia4V5PWq4CiTfZPDn6rpRm5In7U&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/juno-hot-artifacts-repository-finalize-design"&gt;https://etherpad.openstack.org/p/juno-hot-artifacts-repository-finalize-design&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://etherpad.openstack.org/p/juno-summit-tripleo-tuskar-planning"&gt;https://etherpad.openstack.org/p/juno-summit-tripleo-tuskar-planning&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://wiki.openstack.org/wiki/Barbican"&gt;https://wiki.openstack.org/wiki/Barbican&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://wiki.openstack.org/wiki/TripleO/TuskarJunoPlanning"&gt;https://wiki.openstack.org/wiki/TripleO/TuskarJunoPlanning&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://wiki.openstack.org/wiki/TripleO/TuskarJunoPlanning/TemplateBackend"&gt;https://wiki.openstack.org/wiki/TripleO/TuskarJunoPlanning/TemplateBackend&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Thu, 17 Jul 2014 00:00:00 </pubDate></item><item><title>SSL PKI</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/ssl_pki.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-ssl-pki"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-ssl-pki&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Each of our clouds require multiple ssl certificates to operate. We need to
support generating these certificates in devtest in a manner which will
closely resemble the needs of an actual deployment. We also need to support
interfacing with the PKI (Public Key Infrastructure) of existing organizations.
This spec outlines the ways we will address these needs.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;We have a handful of services which require SSL certificates:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Keystone&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Public APIs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Galera replication&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RabbitMQ replication&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;Developers need to have these certificates generated automatically for them,
while organizations will likely want to make use of their existing PKI. We
have not made clear at what level we will manage these certificates and/or
their CA(s) and at what level the user will be responsible for them. This is
further complicated by the Public API’s likely having a different CA than the
internal-only facing services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;Each of these services will accept their SSL certificate, key, and CA via
environment JSON (heat templates for over/undercloud, config.json for seed).&lt;/p&gt;
&lt;p&gt;At the most granular level, a user can specify these values by editing the
over/undercloud-env.json or config.json files. If a certificate and key is
specified for a service then we will not attempt to automatically generate one
for that service. If only a certificate or key is specified it is considered
an error.&lt;/p&gt;
&lt;p&gt;If no certificate and key is specified for a service, we will attempt to
generate a certificate and key, and sign the certificate with a self-signed
CA we generate. Both the undercloud and seed will share a self-signed CA in
this scenario, and each overcloud will have a separate self-signed CA. We will
also add this self-signed CA to the chain of trust for hosts which use services
of the cloud being created.&lt;/p&gt;
&lt;p&gt;The use of a custom CA for signing the automatically generated certificates
will be solved in a future iteration.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None presented thus far.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;This change has high security impact as it affects our PKI. We currently do not
have any SSL support, and implementing this should therefore improve our
security. We should ensure all key files we create in this change have file
permissions of 0600 and that the directories they reside in have permissions
of 0700.&lt;/p&gt;
&lt;p&gt;There are many security implications for SSL key generation (including entropy
availability) and we defer to the OpenStack Security Guide[1] for this.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;Users can interact with this feature by editing the under/overcloud-env.json
files and the seed config.json file. Additionally, the current properties which
are used for specifying the keystone CA and certificate will be changed to
support a more general naming scheme.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;We will be performing key generation which can require a reasonable amount of
resources, including entropy sources.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;More SSL keys will be generated for developers. Debugging via monitoring
network traffic can also be more difficult once SSL is adopted. Production
environments will also require SSL unwrapping to debug network traffic, so this
will allow us to closer emulate production (developers can now spot missing SSL
wrapping).&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;The code behind generate-keystone-pki in os-cloud-config will be generalized
to support creation of a CA and certificates separately, and support creation
of multiple certificates using a single CA. A new script will be created
named ‘generate-ssl-cert’ which accepts a heat environment JSON file and a
service name. This will add ssl.certificate and ssl.certificate_key properties
under the servicename property (an example is below). If no ssl.ca_certificate
and ssl.ca_certificate_key properties are defined then this script will perform
generation of the self-signed certificate.&lt;/p&gt;
&lt;p&gt;Example heat environment output:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"ssl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"ca_certificate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;PEM Data&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"ca_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;PEM Data&amp;gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="s2"&gt;"horizon"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"ssl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"ca_certificate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;PEM Data&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ca_certificate_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;PEM Data&amp;gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;greghaynes&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Generalize CA/certificate creation in os-cloud-config.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add detection logic for certificate key pairs in -env.json files to devtest&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make devtest scripts call CA/cert creation scripts if no cert is found
for a service&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;The services listed above are not all set up to use SSL certificates yet. This
is required before we can add detection logic for user specified certificates
for all services.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Tests for new functionality will be made to os-cloud-config. The default
behavior for devtest is designed to closely mimic a production setup, allowing
us to best make use of our CI.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will need to document the new interfaces described in ‘Other End User
Impact’.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Openstack Security Guide: &lt;a class="reference external" href="http://docs.openstack.org/security-guide/content/"&gt;http://docs.openstack.org/security-guide/content/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</description><pubDate>Mon, 30 Jun 2014 00:00:00 </pubDate></item><item><title>Backwards compatibility and TripleO</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/backwards-compat-policy.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-backwards-compat"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-backwards-compat&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;TripleO has run with good but not perfect backwards compatibility since
creation. It’s time to formalise this in a documentable and testable fashion.&lt;/p&gt;
&lt;p&gt;TripleO will follow Semantic Versioning (aka &lt;a class="reference external" href="http://docs.openstack.org/developer/pbr/semver.html"&gt;semver&lt;/a&gt;) for versioning all
releases. We will strive to avoid breaking backwards compatibility at all, and
if we have to it will be because of extenuating circumstances such as security
fixes with no other way to fix things.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;TripleO has historically run with an unspoken backwards compatibility policy
but we now have too many people making changes - we need to build a single
clear policy or else our contributors will have to rework things when one
reviewer asks for backwards compat when they thought it was not needed (or vice
versa do the work to be backwards compatible when it isn’t needed.&lt;/p&gt;
&lt;p&gt;Secondly, because we haven’t marked any of our projects as 1.0.0 there is no
way for users or developers to tell when and where backwards compatibility is
needed / appropriate.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;Adopt the following high level heuristics for identifying backwards
incompatible changes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Making changes that break user code that scripts or uses a public interface.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Becoming unable to install something we could previously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Being unable to install something because someone else has altered things -
e.g. being unable to install F20 if it no longer exists on the internet
is not an incompatible change - if it were returned to the net, we’d be able
to install it again. If we remove the code to support this thing, then we’re
making an incompatible change. The one exception here is unsupported
projects - e.g. unsupported releases of OpenStack, or Fedora, or Ubuntu.
Because unsupported releases are security issues, and we expect most of our
dependencies to do releases, and stop supporting things, we will not treat
cleaning up code only needed to support such an unsupported release as
backwards compatible. For instance, breaking the ability to deploy a previous
&lt;em&gt;still supported&lt;/em&gt; OpenStack release where we had previously been able to
deploy it is a backwards incompatible change, but breaking the ability to
deploy an &lt;em&gt;unsupported&lt;/em&gt; OpenStack release is not.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Corollaries to these principles:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Breaking a public API (network or Python). The public API of a project is
any released API (e.g. not explicitly marked alpha/beta/rc) in a version that
is &amp;gt;= 1.0.0. For Python projects, a _ prefix marks a namespace as non-public
e.g. in &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;foo.\_bar.quux&lt;/span&gt;&lt;/code&gt; &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;quux&lt;/span&gt;&lt;/code&gt; is not public because it’s in a non-public
namespace. For our projects that accept environment variables, if the
variable is documented (in the README.md/user documentation) then the variable
is part of the public interface. Otherwise it is not.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Increasing the set of required parameters to Heat templates. This breaks
scripts that use TripleO to deploy. Note that adding new parameters which
need to be set when deploying &lt;em&gt;new&lt;/em&gt; things is fine because the user is
doing more than just pulling in updated code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Decreasing the set of accepted parameters to Heat templates. Likewise, this
breaks scripts using the Heat templates to do deploys. If the parameters are
no longer accepted because they are for no longer supported versions of
OpenStack then that is covered by the carve-out above.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Increasing the required metadata to use an element except when both Tuskar
and tripleo-heat-templates have been updated to use it. There is a
bi-directional dependency from t-i-e to t-h-t and back - when we change
signals in the templates we have to update t-i-e first, and when we change
parameters to elements we have to alter t-h-t first. We could choose to make
t-h-t and t-i-e completely independent, but don’t believe that is a sensible
use of time - they are closely connected, even though loosely coupled.
Instead we’re treating them a single unit: at any point in time t-h-t can
only guarantee to deploy images built from some minimum version of t-i-e,
and t-i-e can only guarantee to be deployed with some minimum version of
t-h-t. The public API here is t-h-t’s parameters, and the link to t-i-e
is equivalent to the dependency on a helper library for a Python
library/program: requiring new minor versions of the helper library is not
generally considered to be an API break of the calling code. Upgrades will
still work with this constraint - machines will get a new image at the same
time as new metadata, with a rebuild in the middle. Downgrades / rollback
may require switching to an older template at the same time, but that was
already the case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Decreasing the accepted metadata for an element if that would result in an
error or misbehaviour.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other sorts of changes may also be backwards incompatible, and if identified
will be treated as such - that is, this list is not comprehensive.&lt;/p&gt;
&lt;p&gt;We don’t consider the internal structure of Heat templates to be an API, nor
any test code within the TripleO codebases (whether it may appear to be public
or not).&lt;/p&gt;
&lt;p&gt;TripleO’s incubator is not released and has no backwards compatibility
guarantees - but a point in time incubator snapshot interacts with ongoing
releases of other components - and they will be following semver, which means
that a user wanting stability can get that as long as they don’t change the
incubator.&lt;/p&gt;
&lt;p&gt;TripleO will promote all its component projects to 1.0 within one OpenStack
release cycle of them being created. Projects may not become dependencies of a
project with a 1.0 or greater version until they are at 1.0 themselves. This
restriction serves to prevent version locking (makes upgrades impossible) by
the depending version, or breakage (breaks users) if the pre 1.0 project breaks
compatibility. Adding new projects will involve creating test jobs that test
the desired interactions before the dependency is added, so that the API can
be validated before the new project has reached 1.0.&lt;/p&gt;
&lt;p&gt;Adopt the following rule on &lt;em&gt;when&lt;/em&gt; we are willing to [deliberately] break
backwards compatibility:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;When all known uses of the code are for no longer supported OpenStack
releases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;If the PTL signs off on the break. E.g. a high impact security fix for which&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;we cannot figure out a backwards compatible way to deliver it to our users
and distributors.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We also need to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Set a timeline for new codebases to become mature (one cycle). Existing
codebases will have the clock start when this specification is approved.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Set rules for allowing anyone to depend on new codebases (codebase must be
1.0.0).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document what backwards compatible means in the context of heat templates and
elements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add an explicit test job for deploying Icehouse from trunk, because that will
tell us about our ability to deploy currently supported OpenStack versions
which we could previously deploy - that failing would indicate the proposed
patch is backwards incompatible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If needed either fix Icehouse, or take a consensus decision to exclude
Icehouse support from this policy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Commit to preserving backwards compatibility.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When we need alternate codepaths to support backwards compatibility we will
mark them clearly to facilitate future cleanup:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="c1"&gt;# Backwards compatibility: &amp;lt;....&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt;
    &lt;span class="c1"&gt;# Trunk&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt;
    &lt;span class="c1"&gt;# Icehouse&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;
    &lt;span class="c1"&gt;# Havana&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We could say that we don’t do backwards compatibility and release like the
OpenStack API services do, but this makes working with us really difficult
and it also forces folk with stable support desires to work from separate
branches rather than being able to collaborate on a single codebase.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We could treat tripleo-heat-templates and tripleo-image-elements separately
to the individual components and run them under different rules - e.g. using
stable branches rather than semver. But there have been so few times that
backwards compatibility would be hard for us that this doesn’t seem worth
doing.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Keeping code around longer may have security considerations, but this is a
well known interaction.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;End users will love us.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None anticipated. Images will be a marginally larger due to carrying backwards
compat code around.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Deployers will appreciate not having to rework things. Not that they have had
to, but still.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;Developers will have clear expectations set about backwards compatibility which
will help them avoid being asked to rework things. They and reviewers will need
to look out for backward incompatible changes and special case handling of
them to deliver the compatibility we aspire to.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;lifeless&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Other contributors:&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Draft this spec.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get consensus around it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Release all our non-incubator projects as 1.0.0.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add Icehouse deploy test job. (Because we could install Icehouse at the start
of Juno, and if we get in fast we can keep being able to do so).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None. An argument could be made for doing a quick cleanup of stuff, but the
reality is that it’s not such a burden we’ve had to clean it up yet.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;To ensure we don’t accidentally break backwards compatibility we should look
at the oslo cross-project matrix eventually - e.g. run os-refresh-config
against older releases of os-apply-config to ensure we’re not breaking
compatibility. Our general policy of building releases of things and using
those goes a long way to giving us good confidence though - we can be fairly
sure of no single-step regressions (but will still have to watch out for
N-step regressions unless some mechanism is put in place).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The users manual and developer guides should reflect this.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;/section&gt;
</description><pubDate>Tue, 24 Jun 2014 00:00:00 </pubDate></item><item><title>os-collect-config local data source</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/tripleo-juno-occ-localdatasource.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo-juno-occ-local-datasource"&gt;https://blueprints.launchpad.net/tripleo-juno-occ-local-datasource&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;os-collect-config needs a local data source collector for configuration data.
This will allow individual elements to drop files into a well-known location to
set the initial configuration data of an instance.&lt;/p&gt;
&lt;p&gt;There is already a heat_local collector, but that uses a single hard coded path
of /var/lib/heat-cfntools/cfn-init-data.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Individual elements can not currently influence the configuration available
to os-apply-config for an instance without overwriting each other.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elements that rely on configuration values that must be set the same at both
image build time and instance run time currently have no way of propagating the
value used at build time to a run time value.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elements have no way to specify default values for configuration they may
need at runtime (outside of configuration file templates).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;A new collector class will be added to os-collect-config that collects
configuration data from JSON files in a configurable list of directories with a
well known default of /var/lib/os-collect-config/local-data.&lt;/p&gt;
&lt;p&gt;The collector will return a list of pairs of JSON files and their content,
sorted by the JSON filename in traditional C collation.  For example, if
/var/lib/os-collect-config/local-data contains bar.json and foo.json&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;dl class="simple"&gt;
&lt;dt&gt;[ (‘bar.json’, bar_content),&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;(‘foo.json’, foo_content) ]&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;This new collector will be configured first in DEFAULT_COLLECTORS in
os-collect-config. This means all later configured collectors will override any
shared configuration keys from the local datasource collector.&lt;/p&gt;
&lt;p&gt;Elements making use of this feature can install a json file into the
/var/lib/os-collect-config/local-data directory. The os-collect-config element
will be responsible for creating the /var/lib/os-collect-config/local-data
directory at build time and will create it with 0755 permissions.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;section id="os-config-files"&gt;
&lt;h4&gt;OS_CONFIG_FILES&lt;/h4&gt;
&lt;p&gt;There is already a mechanism in os-apply-config to specify arbitrary files to
look at for configuration data via setting the OS_CONFIG_FILES environment
variable. However, this is not ideal because each call to os-apply-config would
have to be prefaced with setting OS_CONFIG_FILES, or it would need to be set
globally in the environment (via an environment.d script for instance). As an
element developer, this is not clear. Having a robust and clear documented
location to drop in configuration data will be simpler.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="heat-local-collector"&gt;
&lt;h4&gt;heat_local collector&lt;/h4&gt;
&lt;p&gt;There is already a collector that reads from local data, but it must be
configured to read explicit file paths. This does not scale well if several
elements want to each provide local configuration data, in that you’d have to
reconfigure os-collect-config itself. We could modify the heat_local collector
to read from directories instead, while maintaining backwards compatibility as
well, instead of writing a whole new collector. However, given that collectors
are pretty simple implementations, I’m proposing just writing a new one, so
that they remain generally single purpose with clear goals.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Harmful elements could drop bad configuration data into the well known
location. This is mitigated somewhat that as a deployer, you should know and
validate what elements you’re using that may inject local configuration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We should verify that the local data source files are not world writable and
are in a directory that is root owned. Checks to dib-lint could be added to
verify this at image build time. Checks could be added to os-collect-config
for instance run time.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;An additional collector will be running as part of os-collect-config, but its
execution time should be minimal.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;There will be an additional configuration option in os-collect-config to
configure the list of directories to look at for configuration data. This
will have a reasonable default and will not usually need to be changed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deployers will have to consider what local data source configuration may be
influencing their current applied configuration.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;We will need to make clear in documentation when to use this feature versus
what to expose in a template or specify via passthrough configuration.
Configuration needed at image build time where you need access to those values
at instance run time as well are good candidates for using this feature.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;james-slagle&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;write new collector for os-collect-config&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;unit tests for new collector&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;document new collector&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;add checks to dib-lint to verify JSON files installed to the local data
source directory are not world writable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;add checks to os-collect-config to verify JSON files read by the local data
collector are not world writable and that their directory is root owned.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The configurable /mnt/state spec at:
&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-configurable-mnt-state"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-configurable-mnt-state&lt;/a&gt;
depends on this spec.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Unit tests will be written for the new collector. The new collector will also
eventually be tested in CI because there will be an existing element that will
configure the persistent data directory to /mnt/state that will make use of
this implementation.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The ability of elements to drop configuration data into a well known location
should be documented in tripleo-image-elements itself so folks can be made
better aware of the functionality.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-configurable-mnt-state"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-configurable-mnt-state&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/#/c/94876"&gt;https://review.openstack.org/#/c/94876&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Wed, 18 Jun 2014 00:00:00 </pubDate></item><item><title>Virtual IPs for public addresses</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/virtual-public-ips.html</link><description>
 
&lt;p&gt;Include the URL of your launchpad blueprint:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+specs/tripleo-juno-virtual-public-ips"&gt;https://blueprints.launchpad.net/tripleo/+specs/tripleo-juno-virtual-public-ips&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The current public IP feature is intended to specify the endpoint that a cloud
can be reached at. This is typically something where HA is highly desirable.&lt;/p&gt;
&lt;p&gt;Making the public IP be a virtual IP instead of locally bound to a single
machine should increase the availability of the clustered service, once we
increase the control plane scale to more than one machine.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Today, we run all OpenStack services with listening ports on one virtual IP.&lt;/p&gt;
&lt;p&gt;This means that we’re exposing RabbitMQ, MySQL and possibly other cluster-only
services to the world, when really what we want is public services exposed to
the world and cluster only servers not exposed to the world. Deployers are
(rightfully) not exposing our all-services VIP to the world, which leads to
them having to choose between a) no support for externally visible endpoints,
b) all services attackable or c) manually tracking the involved ports and
playing a catch-up game as we evolve things.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;Create a second virtual IP from a user supplied network. Bind additional copies
of API endpoints that should be publically accessible to that virtual IP. We
need to keep presenting them internally as well (still via haproxy and the
control virtual IP) so that servers without any public connectivity such as
hypervisors can still use the APIs (though they may need to override the IP to
use in their hosts files - we have facilities for that already).&lt;/p&gt;
&lt;p&gt;The second virtual IP could in principle be on a dedicated ethernet card, or
on a VLAN on a shared card. For now, lets require the admin to specify the
interface on which keepalived should be provisioning the shared IP - be that
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;br-ctlplane&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vlan25&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;eth2&lt;/span&gt;&lt;/code&gt;. Because the network topology may be
independent, the keepalive quorum checks need to take place on the specified
interface even though this costs external IP addresses.&lt;/p&gt;
&lt;p&gt;The user must be able to specify the same undercloud network as they do today
so that small installs are not made impossible - requiring two distinct
networks is likely hard for small organisations. Using the same network would
not imply using the same IP address - a dedicated IP address will still be
useful to permit better testing confidence and also allows for simple exterior
firewalling of the cluster.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;We could not do HA for the public endpoints - not really an option.&lt;/p&gt;
&lt;p&gt;We could not do public endpoints and instead document how to provide border
gateway firewalling and NAT through to the endpoints. This just shifts the
problem onto infrastructure we are not deploying, making it harder to deploy.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;Our security story improves by making this change, as we can potentially
start firewalling the intra-cluster virtual IP to only allow known nodes to
connect. Short of that, our security story has improved since we started
binding to specific ips only, as that made opening a new IP address not
actually expose core services (other than ssh) on it.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;End users will need to be able to find out about the new virtual IP. That
should be straight forward via our existing mechanisms.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None anticipated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Deployers will require an additional IP address either on their undercloud
ctlplane network (small installs) or on their public network (larger/production
installs).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None expected.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;lifeless (hahahaha)&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;None.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Generalise keepalived.conf to support multiple VRRP interfaces.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add support for binding multiple IPs to the haproxy configuration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add logic to incubator and/or heat templates to request a second virtual IP.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Change heat templates to bind public services to the public virtual IP.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Possibly tweak setup-endpoints to cooperate, though the prior support
should be sufficient.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are out of scope for this, but necessary to use it - I intend to put
them in the discussion in Dan’s network overhaul spec.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add optional support to our heat templates to boot the machines with two
nics, not just one - so that we have an IP address for the public interface
when its a physical interface. We may find there are ordering / enumeration
issues in Nova/Ironic/Neutron to solve here.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add optional support to our heat templates for statically allocating a port
from neutron and passing it into the control plane for when we’re using
VLANs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;This will be on by default, so our default CI path will exercise it.&lt;/p&gt;
&lt;p&gt;Additionally we’ll be using it in the up coming VLAN test job which will
give us confidence it works when the networks are partitoned.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;Add to the manual is the main thing.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Mon, 16 Jun 2014 00:00:00 </pubDate></item><item><title>Promote HEAT_ENV</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/promote-heat-env.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-promote-heat-env"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-promote-heat-env&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Promote values set in the Heat environment file to take precedence over
input environment variables.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Historically TripleO scripts have consulted the environment for many items of
configuration. This raises risks of scope leakage and the number of environment
variables required often forces users to manage their environment with scripts.
Consequently, there’s a push to prefer data files like the Heat environment
file (HEAT_ENV) which may be set by passing -e to Heat. To allow this file to
provide an unambiguous source of truth, the environment must not be allowed to
override the values from this file. That is to say, precedence must be
transferred.&lt;/p&gt;
&lt;p&gt;A key distinction is whether the value of an environment variable is obtained
from the environment passed to it by its parent process (either directly or
through derivation). Those which are will be referred to as “input variables”
and are deprecated by this spec. Those which are not will be called “local
variables” and may be introduced freely. Variables containing values
synthesised from multiple sources must be handled on a case-by-case basis.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;Since changes I5b7c8a27a9348d850d1a6e4ab79304cf13697828 and
I42a9d4b85edcc99d13f7525e964baf214cdb7cbf, ENV_JSON (the contents of the file
named by HEAT_ENV) is constructed in devtest_undercloud.sh like so:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;ENV_JSON=$(jq '.parameters = {
  "MysqlInnodbBufferPoolSize": 100
} + .parameters + {
  "AdminPassword": "'"${UNDERCLOUD_ADMIN_PASSWORD}"'",
  "AdminToken": "'"${UNDERCLOUD_ADMIN_TOKEN}"'",
  "CeilometerPassword": "'"${UNDERCLOUD_CEILOMETER_PASSWORD}"'",
  "GlancePassword": "'"${UNDERCLOUD_GLANCE_PASSWORD}"'",
  "HeatPassword": "'"${UNDERCLOUD_HEAT_PASSWORD}"'",
  "NovaPassword": "'"${UNDERCLOUD_NOVA_PASSWORD}"'",
  "NeutronPassword": "'"${UNDERCLOUD_NEUTRON_PASSWORD}"'",
  "NeutronPublicInterface": "'"${NeutronPublicInterface}"'",
  "undercloudImage": "'"${UNDERCLOUD_ID}"'",
  "BaremetalArch": "'"${NODE_ARCH}"'",
  "PowerSSHPrivateKey": "'"${POWER_KEY}"'",
  "NtpServer": "'"${UNDERCLOUD_NTP_SERVER}"'"
}' &amp;lt;&amp;lt;&amp;lt; $ENV_JSON)
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is broadly equivalent to “A + B + C”, where values from B override those
from A and values from C override those from either. Currently section C
contains a mix of input variables and local variables. It is proposed that
current and future environment variables are allocated such that:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;A only contains default values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;B is the contents of the HEAT_ENV file (from either the user or a prior run).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;C only contains computed values (from local variables).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following are currently in section C but are not local vars:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;NeutronPublicInterface&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;'eth0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;UNDERCLOUD_NTP_SERVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The input variables will be ignored and the defaults moved into section A:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;ENV_JSON=$(jq '.parameters = {
  "MysqlInnodbBufferPoolSize": 100,
  "NeutronPublicInterface": "eth0",
  "NtpServer": ""
} + .parameters + {
  ... elided ...
}' &amp;lt;&amp;lt;&amp;lt; $ENV_JSON)
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;devtest_overcloud.sh will be dealt with similarly. These are the variables
which need to be removed and their defaults added to section A:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;OVERCLOUD_NAME&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;OVERCLOUD_HYPERVISOR_PHYSICAL_BRIDGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;OVERCLOUD_HYPERVISOR_PUBLIC_INTERFACE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;OVERCLOUD_BRIDGE_MAPPINGS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;OVERCLOUD_FLAT_NETWORKS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;NeutronPublicInterface&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;'eth0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;OVERCLOUD_LIBVIRT_TYPE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;'qemu'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;OVERCLOUD_NTP_SERVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Only one out of all these input variables is used outside of these two scripts
and consequently the rest are safe to remove.&lt;/p&gt;
&lt;p&gt;The exception is OVERCLOUD_LIBVIRT_TYPE. This is saved by the script
‘write-tripleorc’. As it will now be preserved in HEAT_ENV, it does not need to
also be preserved by write-tripleorc and can be removed from there.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;So that users know they need to start setting these values through HEAT_ENV
rather than input variables, it is further proposed that for an interim period
each script echo a message to STDERR if deprecated input variables are set. For
example:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;for OLD_VAR in OVERCLOUD_NAME; do
  if [ ! -z "${!OLD_VAR}" ]; then
    echo "WARNING: ${OLD_VAR} is deprecated, please set this in the" \
         "HEAT_ENV file (${HEAT_ENV})" 1&amp;gt;&amp;amp;2
  fi
done
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;To separate user input from generated values further, it is proposed that user
values be read from a new file - USER_HEAT_ENV. This will default to
{under,over}cloud-user-env.json. A new commandline parameter, –user-heat-env,
will be added to both scripts so that this can be changed.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;ENV_JSON is initialised with default values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ENV_JSON is overlaid by HEAT_ENV.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ENV_JSON is overlaid by USER_HEAT_ENV.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ENV_JSON is overlaid by computed values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ENV_JSON is saved to HEAT_ENV.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;See &lt;a class="reference external" href="http://paste.openstack.org/show/83551/"&gt;http://paste.openstack.org/show/83551/&lt;/a&gt; for an example of how to accomplish
this. In short:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;ENV_JSON=$(cat ${HEAT_ENV} ${USER_HEAT_ENV} | jq -s '
  .[0] + .[1] + {"parameters":
    ({..defaults..} + .[0].parameters + {..computed..} + .[1].parameters)}')
cat &amp;gt; "${HEAT_ENV}" &amp;lt;&amp;lt;&amp;lt; ${ENV_JSON}
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Choosing to move user data into a new file, compared to moving the merged data,
makes USER_HEAT_ENV optional. If users wish, they can continue providing their
values in HEAT_ENV. The complementary solution requires users to clean
precomputed values out of HEAT_ENV, or they risk unintentionally preventing the
values from being recomputed.&lt;/p&gt;
&lt;p&gt;Loading computed values after user values sacrifices user control in favour of
correctness. Considering that any devtest user must be rather technical, if a
computation is incorrect they can fix or at least hack the computation
themselves.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;Instead of removing the input variables entirely, an interim form could be
used:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;ENV_JSON=$(jq '.parameters = {
  "MysqlInnodbBufferPoolSize": 100,
  "NeutronPublicInterface": "'"${NeutronPublicInterface}"'",
  "NtpServer": "'"${UNDERCLOUD_NTP_SERVER}"'"
} + .parameters + {
  ...
}
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;However, the input variables would only have an effect if the keys they affect
are not present in HEAT_ENV. As HEAT_ENV is written each time devtest runs, the
keys will usually be present unless the file is deleted each time (rendering it
pointless). So this form is more likely to cause confusion than aid
transition.&lt;/p&gt;
&lt;hr class="docutils"/&gt;
&lt;p&gt;jq includes an ‘alternative operator’, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;//&lt;/span&gt;&lt;/code&gt;, which is intended for providing
defaults:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;form&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="n"&gt;produces&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;same&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;produces&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="n"&gt;than&lt;/span&gt; &lt;span class="n"&gt;false&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;null&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Otherwise&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="n"&gt;produces&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;same&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This has not been used in the proposal for two reasons:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;It only works on individual keys, not whole maps.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It doesn’t work in jq 1.2, still included by Ubuntu 13.04 (Saucy).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;An announcement will be made on the mailing list when this change merges. This
coupled with the warnings given if the deprecated variables are set should
provide sufficient notice.&lt;/p&gt;
&lt;p&gt;As HEAT_ENV is rewritten every time devtest executes, we can safely assume it
matches the last environment used. However users who use scripts to switch
their environment may be surprised. Overall the change should be a benefit to
these users, as they can use two separate HEAT_ENV files (passing –heat-env to
specify which to activate) instead of needing to maintain scripts to set up
their environment and risking settings leaking from one to the other.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;p&gt;lxsli&lt;/p&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add USER_HEAT_ENV to both scripts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move variables in both scripts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add deprecated variables warning to both scripts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove OVERCLOUD_LIBVIRT_TYPE from write-tripleorc.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;The change will be tested in isolation from the rest of the script.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Update usage docs with env var deprecation warnings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update usage docs to recommend HEAT_ENV.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://stedolan.github.io/jq/manual/"&gt;http://stedolan.github.io/jq/manual/&lt;/a&gt; - JQ manual&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://jqplay.herokuapp.com/"&gt;http://jqplay.herokuapp.com/&lt;/a&gt; - JQ interactive demo&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</description><pubDate>Thu, 22 May 2014 00:00:00 </pubDate></item><item><title>Triple CI improvements</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/tripleo-juno-ci-improvements.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-ci-improvements"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-ci-improvements&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Tripleo CI is painful at the moment, we have problems with both reliability
and consistency of running job times, this spec is intended to address a
number of the problems we have been facing.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Developers should be able to depend on CI to produce reliable test results with
a minimum number of false negatives reported in a timely fashion, this
currently isn’t the case. To date the reliability of tripleo ci has been
heavily effected by network glitches, availability of network resources and
reliability of the CI clouds. This spec is intended to deal with the problems
we have been seeing.&lt;/p&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;&lt;strong&gt;Problem :&lt;/strong&gt; Reliability of hp1 (&lt;a class="reference internal" href="#hp1-reliability"&gt;hp1_reliability&lt;/a&gt;)&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Intermittent failures on jobs running on the hp1 cloud have been causing a
large number of job failures and sometimes taking this region down
altogether.  Current thinking is that the root of most of these issues is
problems with a mellanox driver.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Problem :&lt;/strong&gt; Unreliable access to network resources (&lt;a class="reference internal" href="#net-reliability"&gt;net_reliability&lt;/a&gt;)&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;Gaining reliable access to various network resources has been inconsistent
causing a CI outage when any one network resource is unavailable. Also
inconsistent speeds downloading these resources can make it difficult to
gauge overall speed improvements made to tripleo.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Problem :&lt;/strong&gt; (&lt;a class="reference internal" href="#system-health"&gt;system_health&lt;/a&gt;) The health of the overall CI system isn’t&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;immediately obvious, problems often persist for hours (or occasionally days)
before we react to them.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Problem :&lt;/strong&gt;  (&lt;a class="reference internal" href="#ci-run-times"&gt;ci_run_times&lt;/a&gt;) The tripleo devtest story takes time to run,&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;this uses up CI resources and developer’s time, where possible we should
reduce the time required to run devtest.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Problem :&lt;/strong&gt; (&lt;a class="reference internal" href="#inefficient-usage"&gt;inefficient_usage&lt;/a&gt;) Hardware on which to run tripleo is a finite&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;resource, there is a spec in place to run devtest on an openstack
deployment[1], this is the best way forward in order to use the resources we
have in the most efficient way possible. We also have a number of options to
explore that would help minimise resource wastage.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Problem :&lt;/strong&gt; (&lt;a class="reference internal" href="#system-feedback"&gt;system_feedback&lt;/a&gt;) Our CI provides no feedback about trends.&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;A good CI system should be more than a system that reports pass or fail, we
should be getting feedback on metrics allowing us to observe degradations,
where possible we should make use of services already provided by infra.
This will allow us to proactively intervene as CI begins to degrade?&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Problem :&lt;/strong&gt; (&lt;a class="reference internal" href="#bug-frequency"&gt;bug_frequency&lt;/a&gt;) We currently have no indication of which CI&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bugs are occurring most often. This frustrates efforts to make CI more
reliable.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Problem :&lt;/strong&gt; (&lt;a class="reference internal" href="#test-coverage"&gt;test_coverage&lt;/a&gt;) Currently CI only tests a subset of what it&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;should.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;There are a number of changes required in order to address the problems we have
been seeing, each listed here (in order of priority).&lt;/p&gt;
&lt;p id="hp1-reliability"&gt;&lt;strong&gt;Solution :&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Temporarily scale back on CI by removing one of the overcloud jobs (so rh1 has
the capacity to run CI Solo).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove hp1 from the configuration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run burn-in tests on each hp1 host, removing(or repairing) failing hosts.
Burn-in tests should consist of running CI on a newly deployed cloud matching
the load expected to run on the region. Any failure rate should not exceed
that of currently deployed regions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Redeploy testing infrastructure on hp1 and test with tempest, this redeploy
should be done with our tripleo scripts so it can be repeated and we
are sure of parity between ci-overcloud deployments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Place hp1 back into CI and monitor situation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add back any removed CI jobs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure burn-in / tempest tests are followed on future regions being deployed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Attempts should be made to deal with problems that develop on already
deployed clouds, if it becomes obvious they can’t be quickly dealt with after
48 hours they should be temporarily removed from the CI infrastructure and will
need to pass the burn-in tests before being added back into production.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p id="net-reliability"&gt;&lt;strong&gt;Solution :&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Deploy a mirror of pypi.openstack.org on each Region.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy a mirror of the Fedora and Ubuntu package repositories on each region.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy squid in each region and cache http traffic through it, mirroring
where possible should be considered our preference but having squid in place
should cache any resources not mirrored.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mirror other resources (e.g. github.com, percona tarballs etc..).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Any new requirements added to devtest should be cachable with caches in
place before the requirement is added.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p id="system-health"&gt;&lt;strong&gt;Solution :&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Monitor our CI clouds and testenvs with Icinga, monitoring should include
ping, starting (and connecting to) new instances, disk usage etc….&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Monitor CI test results and trigger an alert if “X” number of jobs of the
same type fail in succession. An example of using logstash to monitor CI
results can be found here[5].&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once consistency is no longer a problem we will investigate speed improvements
we can make on the speed of CI jobs.&lt;/p&gt;
&lt;p id="ci-run-times"&gt;&lt;strong&gt;Solution :&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Investigate if unsafe disk caching strategies will speed up disk image
creation, if an improvement is found implement it in production CI by one of&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;run “unsafe” disk caching strategy on ci cloud VM’s (would involve exposing
this libvirt option via the nova api).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;use “eatmydata” to noop disk sync system calls, not currently
packaged for F20 but we could try and restart that process[2].&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p id="inefficient-usage"&gt;&lt;strong&gt;Solution :&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Abandon on failure : adding a feature to zuul (or turning it on if it already
exists) to abandon all jobs in a queue for a particular commit as soon as a
voting commit fails. This would minimize usage of resources running long
running jobs that we already know will have to be rechecked.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adding the collectl element to compute nodes and testenv hosts will allow us
to find bottle necks and also identify places where it is safe to overcommit
(e.g. we may find that overcommitting CPU a lot on testenv hosts is viable).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p id="system-feedback"&gt;&lt;strong&gt;Solution :&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Using a combination of logstash and graphite&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Output graphs of occurrences of false negative test results.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Output graphs of CI run times over time in order to identify trends.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Output graphs of CI job peak memory usage over time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Output graphs of CI image sizes over time.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p id="bug-frequency"&gt;&lt;strong&gt;Solution :&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;In order to be able to track false negatives that are hurting us most we
should agree not to use “recheck no bug”, instead recheck with the
relevant bug number. Adding signatures to Elastic recheck for known CI
issues should help uptake of this.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p id="test-coverage"&gt;&lt;strong&gt;Solution :&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Run tempest against the deployed overcloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Test our upgrade story by upgrading to a new images. Initially to avoid
having to build new images we can edit something on the overcloud qcow images
in place in order to get a set of images to upgrade too[3].&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;As an alternative to deploying our own distro mirrors we could simply point
directly at a mirror known to be reliable. This is undesirable as a long
term solution as we still can’t control outages.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;No longer using recheck no bug places a burden on developers to
investigate why a job failed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adding coverage to our tests will increase the overall time to run a job.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Performance of CI should improve overall.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;derekh&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;looking for volunteers…&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;hp1 upgrade to trusty.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Potential pypi mirror.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fedora Mirrors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ubuntu Mirrors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mirroring other non distro resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Per region caching proxy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document CI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Running an unsafe disk caching strategy in the overcloud nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ZUUL abandon on failure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Include collectl on compute and testenv Hosts and analyse output.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mechanism to monitor CI run times.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mechanism to monitor nodepool connection failures to instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove ability to recheck no bug or at the very least discourage its use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Monitoring cloud/testenv health.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Expand ci to include tempest.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Expand ci to include upgrades.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;CI failure rate and timings will be tracked to confirm improvements.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The tripleo-ci repository needs additional documentation in order to describe
the current layout and should then be updated as changes are made.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;[1] spec to run devtest on openstack &lt;a class="reference external" href="https://review.openstack.org/#/c/92642/"&gt;https://review.openstack.org/#/c/92642/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[2] eatmydata for Fedora &lt;a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1007619"&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1007619&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[3] CI upgrades &lt;a class="reference external" href="https://review.openstack.org/#/c/87758/"&gt;https://review.openstack.org/#/c/87758/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[4] summit session &lt;a class="reference external" href="https://etherpad.openstack.org/p/juno-summit-tripleo-ci"&gt;https://etherpad.openstack.org/p/juno-summit-tripleo-ci&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[5] &lt;a class="reference external" href="http://jogo.github.io/gate/tripleo.html"&gt;http://jogo.github.io/gate/tripleo.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Thu, 22 May 2014 00:00:00 </pubDate></item><item><title>Configurable directory for persistent and stateful data</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/tripleo-juno-configurable-mnt-state.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-configurable-mnt-state"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-configurable-mnt-state&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Make the hardcoded /mnt/state path for stateful data be configurable.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;1. A hard coded directory of /mnt/state for persistent data is incompatible
with Red Hat based distros available mechanism for a stateful data path. Red
Hat based distros, such as Fedora, RHEL, and CentOS, have a feature that uses
bind mounts for mounting paths onto a stateful data partition and does not
require manually reconfiguring software to use /mnt/state.&lt;/p&gt;
&lt;p&gt;2. Distros that use SELinux have pre-existing policy that allows access to
specific paths. Reconfiguring these paths to be under /mnt/state, results
in SELinux denials for existing services, requiring additional policy to be
written and maintained.&lt;/p&gt;
&lt;p&gt;3. Some operators and administrators find the reconfiguring of many services to
not use well known default values for filesystem paths to be disruptive and
inconsistent. They do not expect these changes when using a distro that they’ve
come to learn and anticipate certain configurations. These types of changes
also require documentation changes to existing documents and processes.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;Deployers will be able to choose a configurable path instead of the hardcoded
value of /mnt/state for the stateful path.&lt;/p&gt;
&lt;p&gt;A new element, stateful-path will be added that defines the value for the
stateful path. The default will be /mnt/state.&lt;/p&gt;
&lt;p&gt;There are 3 areas that need to respect the configurable path:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;os-apply-config template generation&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;The stateful-path element will set the stateful path value by installing a
JSON file to a well known location for os-collect-config to use as a local
data source. This will require a new local data source collector to be added
to os-collect-config (See &lt;a class="reference internal" href="#dependencies"&gt;Dependencies&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The JSON file’s contents will be based on $STATEFUL_PATH, e.g.:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;{‘stateful-path’: ‘/mnt/state’}&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;File templates (files under os-apply-config in an element) will then be
updated to replace the hard coded /mnt/state with {{stateful-path}}.&lt;/p&gt;
&lt;p&gt;Currently, there is a mix of root locations of the os-apply-config templates.
Most are written under /, although some are written under /mnt/state. The
/mnt/state is hard coded in the directory tree under os-apply-config in these
elements, so this will be removed to have the templates just written under /.
Symlinks could instead be used in these elements to setup the correct paths.
Support can also be added to os-apply-config’s control file mechanism to
indicate these files should be written under the stateful path. An example
patch that does this is at: &lt;a class="reference external" href="https://review.openstack.org/#/c/113651/"&gt;https://review.openstack.org/#/c/113651/&lt;/a&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;os-refresh-config scripts run at boot time&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;In order to make the stateful path configurable, all of the hard coded
references to /mnt/state in os-refresh-config scripts will be replaced with an
environment variable, $STATEFUL_PATH.&lt;/p&gt;
&lt;p&gt;The stateful-path element will provide an environment.d script for
os-refresh-config that reads the value from os-apply-config:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;export STATEFUL_PATH=$(os-apply-config –key stateful-path –type raw)&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/dd&gt;
&lt;dt&gt;Hook scripts run at image build time&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;The stateful-path element will provide an environment.d script for use at
image build time:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;p&gt;export STATEFUL_PATH=${STATEFUL_PATH:-“/mnt/state”}&lt;/p&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;The use-ephemeral element will depend on the stateful-path element, effectively
making the default stateful path remain /mnt/state.&lt;/p&gt;
&lt;p&gt;The stateful path can be reconfigured by defining $STATEFUL_PATH either A) in
the environment before an image build; or B) in an element with an
environment.d script which runs earlier than the stateful-path environment.d
script.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;None come to mind, the point of this spec is to enable an alternative to what’s
already existing. There may be additional alternatives out there other folks
may wish to add support for.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;End users using elements that change the stateful path location from /mnt/state
to something else will see this change reflected in configuration files and in
the directories used for persistent and stateful data. They will have to know
how the stateful path is configured and accessed.&lt;/p&gt;
&lt;p&gt;Different TripleO installs would appear different if used with elements that
configured the stateful path differently.&lt;/p&gt;
&lt;p&gt;This also adds some complexity when reading TripleO code, because instead of
there being an explicit path, there would instead be a reference to a
configurable value.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;There will be additional logic in os-refresh-config to determine and set the
stateful path, and an additional local collector that os-collect-config would
use. However, these are negligible in terms of negatively impacting
performance.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;Deployers will be able to choose different elements that may reconfigure the
stateful path or change the value for $STATEFUL_PATH. The default will remain
unchanged however.&lt;/p&gt;
&lt;p&gt;Deployers would have to know what the stateful path is, and if it’s different
across their environment, this could be confusing. However, this seems unlikely
as deployers are likely to be standardizing on one set of common elements,
distro, etc.&lt;/p&gt;
&lt;p&gt;In the future, if TripleO CI and CD clouds that are based on Red Hat distros
make use of this feature to enable Red Hat read only root support, then these
clouds would be configured differently from clouds that are configured to use
/mnt/state. As a team, the tripleo-cd-admins will have to know which
configuration has been used.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;1. Developers need to use the $STATEFUL_PATH and {{stateful-path}}
substitutions when they intend to refer to the stateful path.&lt;/p&gt;
&lt;p&gt;2. Code that needs to know the stateful path will need access to the variable
defining the path, it won’t be able to assume the path is /mnt/state. A call to
os-apply-config to query the key defining the path could be done to get
the value, as long as os-collect-config has already run at least once.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;james-slagle&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;section id="tripleo-incubator"&gt;
&lt;h4&gt;tripleo-incubator&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Update troubleshooting docs to mention that /mnt/state is a configurable
path, and could be different in local environments.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="tripleo-image-elements"&gt;
&lt;h4&gt;tripleo-image-elements&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Add a new stateful-path element that configures stateful-path and $STATEFUL_PATH
to /mnt/state&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update os-apply-config templates to replace /mnt/state with {{stateful-path}}&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update os-refresh-config scripts to replace /mnt/state with $STATEFUL_PATH&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update all elements that have os-apply-config template files under /mnt/state
to just be under /.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;update os-apply-config element to call os-apply-config with a –root
$STATEFUL_PATH option&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;update elements that have paths to os-apply-config generated files (such
as /etc/nova/nova.conf) to refer to those paths as
$STATEFUL_PATH/path/to/file.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;make use-ephemeral element depend on stateful-path element&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;os-collect-config will need a new feature to read from a local data source
directory that elements can install JSON files into, such as a source.d. There
will be a new spec filed on this feature.
&lt;a class="reference external" href="https://review.openstack.org/#/c/100965/"&gt;https://review.openstack.org/#/c/100965/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;os-apply-config will need an option in its control file to support
generating templates under the configurable stateful path. There is a patch
here: &lt;a class="reference external" href="https://review.openstack.org/#/c/113651/"&gt;https://review.openstack.org/#/c/113651/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;There is currently no testing that all stateful and persistent data is actually
written to a stateful partition.&lt;/p&gt;
&lt;p&gt;We should add tempest tests that directly exercise the preserve_ephemeral
option, and have tests that check that all stateful data has been preserved
across a “nova rebuild”. Tempest seems like a reasonable place to add these
tests since preserve_ephemeral is a Nova OpenStack feature. Plus, once TripleO
CI is running tempest against the deployed OverCloud, we will be testing this
feature.&lt;/p&gt;
&lt;p&gt;We should also test in TripleO CI that state is preserved across a rebuild by
adding stateful data before a rebuild and verifying it is still present after a
rebuild.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We will document the new stateful-path element.&lt;/p&gt;
&lt;p&gt;TripleO documentation will need to mention the potential difference in
configuration files and the location of persistent data if a value other than
/mnt/state is used.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;os-collect-config local datasource collector spec:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://review.openstack.org/100965"&gt;https://review.openstack.org/100965&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Red Hat style stateful partition support this will enable:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://git.fedorahosted.org/cgit/initscripts.git/tree/systemd/fedora-readonly"&gt;https://git.fedorahosted.org/cgit/initscripts.git/tree/systemd/fedora-readonly&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://git.fedorahosted.org/cgit/initscripts.git/tree/sysconfig/readonly-root"&gt;https://git.fedorahosted.org/cgit/initscripts.git/tree/sysconfig/readonly-root&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://git.fedorahosted.org/cgit/initscripts.git/tree/statetab"&gt;https://git.fedorahosted.org/cgit/initscripts.git/tree/statetab&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://git.fedorahosted.org/cgit/initscripts.git/tree/rwtab"&gt;https://git.fedorahosted.org/cgit/initscripts.git/tree/rwtab&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
</description><pubDate>Thu, 22 May 2014 00:00:00 </pubDate></item><item><title>TripleO Deploy Cloud Hypervisor Type</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/tripleo-juno-deploy-cloud-hypervisor-type.html</link><description>
 
&lt;p&gt;# TODO: file the actual blueprint…
&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-deploy-cloud-hypervisor-type"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-deploy-cloud-hypervisor-type&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The goal of this spec is to detail how the TripleO deploy cloud type could be
varied from just baremetal to baremetal plus other hypervisors to deploy
Overcloud services.&lt;/p&gt;
&lt;p&gt;Linux kernel containers make this approach attractive due to the lightweight
nature that services and process can be virtualized and isolated, so it seems
likely that libvirt+lxc and Docker would be likely targets. However we should
aim to make this approach as agnostic as possible for those deployers who may
wish to use any Nova driver, such as libvirt+kvm.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;The overcloud control plane is generally lightly loaded and allocation of
entire baremetal machines to it is wasteful. Also, when the Overcloud services
are running entirely on baremetal they take longer to upgrade and rollback.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;We should support any Nova virtualization type as a target for Overcloud
services, as opposed to using baremetal nodes to deploy overcloud images.
Containers are particularly attractive because they are lightweight, easy to
upgrade/rollback and offer similar isolation and security as full VM’s. For the
purpose of this spec, the alternate Nova virtualization target for the
Overcloud will be referred to as alt-hypervisor. alt-hypervisor could be
substituted with libvirt+lxc, Docker, libvirt+kvm, etc.&lt;/p&gt;
&lt;p&gt;At a minimum, we should support running each Overcloud service in isolation in
its own alt-hypervisor instance in order to be as flexible as possible to deployer
needs. We should also support combining services.&lt;/p&gt;
&lt;p&gt;In order to make other alt-hypervisors available as deployment targets for the
Overcloud, we need additional Nova Compute nodes/services configured to use
alt-hypervisors registered with the undercloud Nova.&lt;/p&gt;
&lt;p&gt;Additionally, the undercloud must still be running a Nova compute with the
ironic driver in order to allow for scaling itself out to add additional
undercloud compute nodes.&lt;/p&gt;
&lt;p&gt;To accomplish this, we can run 2 Nova compute processes on each undercloud
node.  One configured with Nova+Ironic and one configured with
Nova+alt-hypervisor.  For the straight baremetal deployment, where an alternate
hypervisor is not desired, the additional Nova compute process would not be
included. This would be accomplished via the standard inclusion/exclusion of
elements during a diskimage-builder tripleo image build.&lt;/p&gt;
&lt;p&gt;It will also be possible to build and deploy just an alt-hypervisor compute
node that is registered with the Undercloud as an additional compute node.&lt;/p&gt;
&lt;p&gt;To minimize the changes needed to the elements, we will aim to run a full init
stack in each alt-hypervisor instance, such as systemd. This will allow all the
services that we need to also be running in the instance (cloud-init,
os-collect-config, etc). It will also make troubleshooting similar to the
baremetal process in that you’d be able to ssh to individual instances, read
logs, restart services, turn on debug mode, etc.&lt;/p&gt;
&lt;p&gt;To handle Neutron network configuration for the Overcloud, the Overcloud
neutron L2 agent will have to be on a provider network that is shared between
the hypervisors. VLAN provider networks will have to be modeled in Neutron and
connected to alt-hypervisor instances.&lt;/p&gt;
&lt;p&gt;Overcloud compute nodes themselves would be deployed to baremetal nodes. These
images would be made up of:
* libvirt+kvm (assuming this is the hypervisor choice for the Overcloud)
* nova-compute + libvirt+kvm driver (registered to overcloud control).
* neutron-l2-agent (registered to overcloud control)
An image with those contents is deployed to a baremetal node via nova+ironic
from the undercloud.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;section id="deployment-from-the-seed"&gt;
&lt;h4&gt;Deployment from the seed&lt;/h4&gt;
&lt;p&gt;An alternative to having the undercloud deploy additional alt-hypervisor
compute nodes would be to register additional baremetal nodes with the seed vm,
and then describe an undercloud stack in a template that is the undercloud
controller and its set of alt-hypervisor compute nodes.  When the undercloud
is deployed via the seed, all of the nodes are set up initially.&lt;/p&gt;
&lt;p&gt;The drawback with that approach is that the seed is meant to be short-lived in
the long term. So, it then becomes difficult to scale out the undercloud if
needed. We could offer a hybrid of the 2 models: launch all nodes initially
from the seed, but still have the functionality in the undercloud to deploy
more alt-hypervisor compute nodes if needed.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="the-init-process"&gt;
&lt;h4&gt;The init process&lt;/h4&gt;
&lt;p&gt;If running systemd in a container turns out to be problematic, it should be
possible to run a single process in the container that starts just the
OpenStack service that we care about. However that process would also need to
do things like read Heat metadata. It’s possible this process could be
os-collect-config. This change would require more changes to the elements
themselves however since they are so dependent on an init process currently in
how they enable/restart services etc. It may be possible to replace os-svc-*
with other tools that don’t use systemd or upstart when you’re building images
for containers.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;We should aim for equivalent security when deploying to alt-hypervisor
instances as we do when deploying to baremetal. To the best of our ability, it
should not be possible to compromise the instance if an individual service is
compromised.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Since Overcloud services and Undercloud services would be co-located on the
same baremetal machine, compromising the hypervisor and gaining access to the
host is a risk to both the Undercloud and Overcloud. We should mitigate this
risk to the best of our ability via things like SELinux, and removing all
unecessary software/processes from the alt-hypervisor instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Certain hypervisors are inherently more secure than others. libvirt+kvm uses
virtualization and is much more secure then container based hypervisors such as
libvirt+lxc and Docker which use namespacing.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None. The impact of this change is limited to Deployers. End users should have
no visibility into the actual infrastructure of the Overcloud.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;Ideally, deploying an overcloud to containers should result in a faster
deployment than deploying to baremetal. Upgrading and downgrading the Overcloud
should also be faster.&lt;/p&gt;
&lt;p&gt;More images will have to be built via diskimage-builder however, which will
take more time.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;The main impact to deployers will be the ability to use alt-hypervisors
instances, such as containers if they wish. They also must understand how to
use nova-baremetal/ironic on the undercloud to scale out the undercloud and add
additional alt-hypervisor compute nodes if needed.&lt;/p&gt;
&lt;p&gt;Additional space in the configured glance backend would also likely be needed
to store additional images.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Developers working on TripleO will have the option of deploying to
alt-hypervisor instances.  This should make testing and developing on some
aspects of TripleO easier due to the need for less vm’s.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;More images will have to be built due to the greater potential variety with
alt-hypervisor instances housing Overcloud services.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;james-slagle&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;section id="tripleo-incubator"&gt;
&lt;h4&gt;tripleo-incubator&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;document how to use an alternate hypervisor for the overcloud deployment
** eventually, this could possibly be the default&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;document how to troubleshoot this type of deployment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;need a user option or json property to describe if the devtest
environment being set up should use an alternate hypervisor for the overcloud
deployment or not. Consider using HEAT_ENV where appropriate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;load-image should be updated to add an additional optional argument that sets
the hypervisor_type property on the loaded images in glance. The argument is
optional and wouldn’t need to be specified for some images, such as regular
dib images that can run under KVM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document commands to setup-neutron for modeling provider VLAN networks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="tripleo-image-elements"&gt;
&lt;h4&gt;tripleo-image-elements&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;add new element for nova docker driver&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;add new element for docker registry (currently required by nova docker
driver)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;more hypervisor specific configuration files for the different nova compute
driver elements
** /etc/nova/compute/nova-kvm.conf
** /etc/nova/compute/nova-baremetal.conf
** /etc/nova/compute/nova-ironic.conf
** /etc/nova/compute/nova-docker.conf&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Separate configuration options per compute process for:
** host (undercloud-kvm, undercloud-baremetal, etc).
** state_path (/var/lib/nova-kvm, /var/lib/nova-baremetal, etc).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Maintain backwards compatibility in the elements by consulting both old and
new heat metadata key namespaces.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="tripleo-heat-templates"&gt;
&lt;h4&gt;tripleo-heat-templates&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Split out heat metadata into separate namespaces for each compute process
configuration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the vlan case, update templates for any network modeling for
alt-hypervisor instances so that those instances have correct interfaces
attached to the vlan network.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="diskimage-builder"&gt;
&lt;h4&gt;diskimage-builder&lt;/h4&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;add ability where needed to build new image types for alt-hypervisor
** Docker
** libvirt+lxc&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Document how to build images for the new types&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;For Docker support, this effort depends on continued development on the nova
Docker driver. We would need to drive any missing features or bug fixes that
were needed in that project.&lt;/p&gt;
&lt;p&gt;For other drivers that may not be as well supported as libvirt+kvm, we will
also have to drive missing features there as well if we want to support them,
such as libvirt+lxc, openvz, etc.&lt;/p&gt;
&lt;p&gt;This effort also depends on the provider resource templates spec (unwritten)
that will be done for the template backend for Tuskar. That work should be done
in such a way that the provider resource templates are reusable for this effort
as well in that you will be able to create templates to match the images that
you intend to create for your Overcloud deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;We would need a separate set of CI jobs that were configured to deploy an
Overcloud to each alternate hypervisor that TripleO intended to support well.&lt;/p&gt;
&lt;p&gt;For Docker support specifically, CI jobs could be considered non-voting since
they’d rely on a stackforge project which isn’t officially part of OpenStack.
We could potentially make this job voting if TripleO CI was enabled on the
stackforge/nova-docker repo so that changes there are less likely to break
TripleO deployments.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;We should update the TripleO specific docs in tripleo-incubator to document how
to use an alternate hypervisor for an Overcloud deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Juno Design Summit etherpad: &lt;a class="reference external" href="https://etherpad.openstack.org/p/juno-summit-tripleo-and-docker"&gt;https://etherpad.openstack.org/p/juno-summit-tripleo-and-docker&lt;/a&gt;
nova-docker driver: &lt;a class="reference external" href="https://git.openstack.org/cgit/stackforge/nova-docker"&gt;https://git.openstack.org/cgit/stackforge/nova-docker&lt;/a&gt;
Docker: &lt;a class="reference external" href="https://www.docker.io/"&gt;https://www.docker.io/&lt;/a&gt;
Docker github: &lt;a class="reference external" href="https://github.com/dotcloud/docker"&gt;https://github.com/dotcloud/docker&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Wed, 21 May 2014 00:00:00 </pubDate></item><item><title>QuintupleO - TripleO on OpenStack</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/tripleo-on-openstack.html</link><description>
 
&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/tripleo-on-openstack"&gt;https://blueprints.launchpad.net/tripleo/+spec/tripleo-on-openstack&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is intended as a new way to do a TripleO deployment in a virtualized
environment.  Rather than provisioning the target virtual machines directly
via virsh, we would be able to use the standard OpenStack apis to create and
manage the instances.  This should make virtual TripleO environments more
scalable and easier to manage.&lt;/p&gt;
&lt;p&gt;Ultimately the goal would be to make it possible to do virtual TripleO
deployments on any OpenStack cloud, except where necessary features have
explicitly been disabled.  We would like to have the needed features
available on the public clouds used for OpenStack CI, so existing providers
are invited to review this specification.&lt;/p&gt;
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;TripleO development and testing requires a lot of hardware resources, and
this is only going to increase as things like HA are enabled by default.
In addition, we are going to want to be able to test larger deployments than
will fit on a single physical machine.  While it would be possible to set
this up manually, OpenStack already provides services capable of managing
a large number of physical hosts and virtual machines, so it doesn’t make
sense to reinvent the wheel.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Write a virtual power driver for OpenStack instances.  I already have a
rough version for nova-baremetal, but it needs a fair amount of cleaning up
before it could be merged into the main codebase.  We will also need to
work with the Ironic team to enable this functionality there.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Determine whether changes are needed in Neutron to allow us to run our own
DHCP server, and if so work with the Neutron team to make those changes.
This will probably require allowing an instance to be booted without any
ip assigned.  If not, booting an instance without an IP would be a good
future enhancement to avoid wasting IP quota.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Likewise, determine how to use virtual ips with keepalived/corosync+pacemaker
in Neutron, and if changes to Neutron are needed work with their team to
enable that functionality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable PXE booting in Nova.  There is already a bug open to track this
feature request, but it seems to have been abandoned.  See the link in the
References section of this document.  Ideally this should be enabled on a
per-instance basis so it doesn’t require a specialized compute node, which
would not allow us to run on a standard public cloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For performance and feature parity with the current virtual devtest
environment, we will want to be allow the use of unsafe caching for the
virtual baremetal instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once all of the OpenStack services support this use case we will want to
convert our CI environment to a standard OpenStack KVM cloud, as well as
deprecate the existing method of running TripleO virtually and enable
devtest to install and configure a local OpenStack installation (possibly
using devstack) on which to run.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Depending on the state of our container support at that time, we may want
to run the devtest OpenStack using containers to avoid taking over the host
system the way devstack normally does.  This may call for its own spec when
we reach that point.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;There’s no real alternative to writing a virtual power driver.  We have to
be able to manage OpenStack instances as baremetal nodes for this to work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Creating a flat Neutron network connected to a local bridge can address the
issues with Neutron not allowing DHCP traffic, but that only works if you
have access to create the local bridge and configure the new network.  This
may not be true in many (all?) public cloud providers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I have not done any work with virtual IP addresses in Neutron yet, so it’s
unclear to me whether any alternatives exist for that.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As noted earlier, using an iPXE image can allow PXE booting of Nova
instances.  However, because that image is overwritten during the deploy,
it is not possible to PXE boot the instance afterward.  Making the TripleO
images bootable on their own might be an option, but it would diverge from
how a real baremetal environment would work and thus is probably not
desirable.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section id="deploy-overcloud-without-pxe-boot"&gt;
&lt;h4&gt;Deploy overcloud without PXE boot&lt;/h4&gt;
&lt;p&gt;Since a number of the complications around doing TripleO development on an
OpenStack cloud relate to PXE booting the instances, one option that could
be useful in some situations is the ability to deploy images directly.  Since
we’re using Heat for deployments, it should be possible to build the TripleO
images with the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;vm&lt;/span&gt;&lt;/code&gt; element and deploy them as regular instances instead of
fake baremetal ones.&lt;/p&gt;
&lt;p&gt;This has the drawback of not exercising as much of the TripleO baremetal
functionality as a full virtual PXE boot process, but it should be easier to
implement, and for some development work not related to the deploy process
would be sufficient for verifying that a feature works as intended.  It might
serve as a good intermediate step while we work to enable full PXE boot
functionality in OpenStack clouds.&lt;/p&gt;
&lt;p&gt;It would also prevent exercising HA functionality because we would likely not
be able to use virtual IP addresses if we can’t use DHCP/PXE to manage our
own networking environment.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The virtual power driver is going to need access to OpenStack
credentials so it can control the instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Neutron changes to allow private networks to behave as flat networks
may have security impacts, though I’m not exactly sure what they would be.
The same applies to virtual IP support.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PXE booting instances could in theory allow an attacker to override the
DHCP server and boot arbitrary images, but in order to do that they would
already need to have access to the private network being used, so I don’t
consider this a significant new threat.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;End users doing proof of concepts using a virtual deployment environment
would need to be switched to this new method, but that should be largely
taken care of by the necessary changes to devtest since that’s what would
be used for such a deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;In my testing, my OpenStack virtual power driver was significantly slower
than the existing virsh-based one, but I believe with a better implementation
that could be easily solved.&lt;/p&gt;
&lt;p&gt;When running TripleO on a public cloud, a developer would be subject to the
usual limitations of shared hardware - a given resource may be oversubscribed
and cause performance issues for the processing or disk-heavy operations done
by a TripleO deployment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;This is not intended to be visible to regular deployers, but it should
make our CI environment more flexible by allowing more dynamic allocation
of resources.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;If this becomes the primary method of doing TripleO development, devtest would
need to be altered to either point at an existing OpenStack environment or
to configure a local one itself.  This will have an impact on how developers
debug problems with their environment, but since they would be debugging
OpenStack in that case it should be beneficial in the long run.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;bnemec&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;jang&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Implement an Ironic OpenStack virtual power driver.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement a nova-baremetal OpenStack virtual power driver, probably out
of tree based on the feedback we’re getting from Nova and Ironic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable PXE booting of Nova instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable unsafe caching to be enabled on Nova instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Allow DHCP/PXE traffic on private networks in Neutron.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If not already covered by the previous point, allow booting of instances
without IP addresses.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migrate CI to use an OpenStack cloud for its virtual baremetal instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migrate devtest to install and configure an OpenStack cloud instead of
managing instances and networking manually.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To simplify the VM provisioning process, we should make it possible to
provision but not boot a Nova VM.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;The Ironic, Neutron, and Nova changes in the Work Items section will all have
to be done before TripleO can fully adopt this feature.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;All changes in the other projects will be unit and functional tested as
would any other new feature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We cannot test this functionality by running devstack to provision an
OpenStack cloud in a gate VM, such as would be done for Tempest, because
the performance of the nested qemu virtual machines would make the process
prohibitively slow.  We will need to have a baremetal OpenStack deployment
that can be targeted by the tests.  A similar problem exists today with
virsh instances, however, and it can probably be solved in a similar
fashion with dedicated CI environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We will need to have Tempest tests gating on all the projects we use to
exercise the functionality we depend on.  This should be largely covered
by the functional tests for the first point, but it’s possible we will find
TripleO-specific scenarios that need to be added as well.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;devtest will need to be updated to reflect the new setup steps needed to run
it against an OpenStack-based environment.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;This is largely based on the discussion Devtest on OpenStack in
&lt;a class="reference external" href="https://etherpad.openstack.org/p/devtest-env-reqs"&gt;https://etherpad.openstack.org/p/devtest-env-reqs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nova bug requesting PXE booting support:
&lt;a class="reference external" href="https://bugs.launchpad.net/nova/+bug/1183885"&gt;https://bugs.launchpad.net/nova/+bug/1183885&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</description><pubDate>Wed, 07 May 2014 00:00:00 </pubDate></item><item><title>Control mechanism for os-apply-config</title><link>https://specs.openstack.org/openstack/tripleo-specs/specs/juno/oac-header.html</link><description>
 
&lt;section id="problem-description"&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;We require a control mechanism in os-apply-config (oac). This could be used,
for example, to:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Not create an empty target&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Set permissions on the target&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="proposed-change"&gt;
&lt;h2&gt;Proposed Change&lt;/h2&gt;
&lt;p&gt;The basic proposal is to parameterise oac with maps (aka dictionaries)
containing control data. These maps will be supplied as YAML in companion
control files. Each file will be named after the template it relates to, with a
“.oac” suffix. For example, the file “abc/foo.sh” would be controlled by
“abc/foo.sh.oac”.&lt;/p&gt;
&lt;p&gt;Only control files with matching templates files will be respected, IE the file
“foo” must exist for the control file “foo.oac” to have any effect. A dib-lint
check will be added to look for file control files without matching templates,
as this may indicate a template has been moved without its control file.&lt;/p&gt;
&lt;p&gt;Directories may also have control files. In this case, the control file must be
inside the directory and be named exactly “oac”. A file either named “oac” or
with the control file suffix “.oac” will never be considered as templates.&lt;/p&gt;
&lt;p&gt;The YAML in the control file must evaluate to nothing or a mapping. The former
allows for the whole mapping having been commented out. The presence of
unrecognised keys in the mapping is an error. File and directory control keys
are distinct but may share names. If they do, they should also share similar
semantics.&lt;/p&gt;
&lt;p&gt;Example control file:&lt;/p&gt;
&lt;div class="highlight-default notranslate"&gt;&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span/&gt;&lt;span class="n"&gt;key1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;true&lt;/span&gt;
&lt;span class="n"&gt;key2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0700&lt;/span&gt;
&lt;span class="c1"&gt;# comment&lt;/span&gt;
&lt;span class="n"&gt;key3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To make the design concrete, one file control key will be offered initially:
allow_empty. This expects a Boolean value and defaults to true. If it is true,
oac will behave as it does today. Otherwise, if after substitutions the
template body is empty, no file will be created at the target path and any
existing file there will be deleted.&lt;/p&gt;
&lt;p&gt;allow_empty will also be allowed as a directory control key. Again, it will
expect a Boolean value and default to true. Given a nested structure
“A/B/C/foo”, where “foo” is an empty file with allow_empty=false:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;C has allow_empty=false: A/B/ is created, C is not.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;B has allow_empty=false: A/B/C/ is created.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;B and C have allow_empty=false: Only A/ is created.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;p&gt;It is expected that additional keys will be proposed soon after this spec is
approved.&lt;/p&gt;
&lt;section id="alternatives"&gt;
&lt;h3&gt;Alternatives&lt;/h3&gt;
&lt;p&gt;A fenced header could be used rather than a separate control file. Although
this aids visibility of the control data, it is less consistent with control
files for directories and (should they be added later) symlinks.&lt;/p&gt;
&lt;p&gt;The directory control file name has been the subject of some debate.
Alternatives to control “foo/” include:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;foo/.oac (not visible with unmodified “ls”)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;foo/oac.control (longer)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;foo/control (generic)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;foo.oac (if foo/ is empty, can’t be stored in git)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;foo/foo.oac (masks control file for foo/foo)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="security-impact"&gt;
&lt;h3&gt;Security Impact&lt;/h3&gt;
&lt;p&gt;None. The user is already in full control of the target environment. For
example, they could use the allow_empty key to delete a critical file. However
they could already simply provide a bash script to do the same. Further, the
resulting image will be running on their (or their customer’s) hardware, so it
would be their own foot they’d be shooting.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-end-user-impact"&gt;
&lt;h3&gt;Other End User Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="performance-impact"&gt;
&lt;h3&gt;Performance Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="other-deployer-impact"&gt;
&lt;h3&gt;Other Deployer Impact&lt;/h3&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="developer-impact"&gt;
&lt;h3&gt;Developer Impact&lt;/h3&gt;
&lt;p&gt;It will no longer be possible to create files named “oac” or with the suffix
“.oac” using oac. This will not affect any elements currently within
diskimage-builder or tripleo-image-elements.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;section id="assignee-s"&gt;
&lt;h3&gt;Assignee(s)&lt;/h3&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;Primary assignee:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;alexisl (aka lxsli, Alexis Lee)&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Other contributors:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;None&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
&lt;section id="work-items"&gt;
&lt;h3&gt;Work Items&lt;/h3&gt;
&lt;blockquote&gt;
&lt;div&gt;&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Support file control files in oac&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support the allow_empty file control key&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add dib-lint check for detached control files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support directory control files in oac&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support the allow_empty directory control key&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the oac README&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="dependencies"&gt;
&lt;h2&gt;Dependencies&lt;/h2&gt;
&lt;p&gt;None.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="testing"&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;This change is easily tested using standard unit test techniques.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="documentation-impact"&gt;
&lt;h2&gt;Documentation Impact&lt;/h2&gt;
&lt;p&gt;The oac README must be updated.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;dl class="simple"&gt;
&lt;dt&gt;There has already been some significant discussion of this feature:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&lt;a class="reference external" href="https://blueprints.launchpad.net/tripleo/+spec/oac-header"&gt;https://blueprints.launchpad.net/tripleo/+spec/oac-header&lt;/a&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;There is a bug open for which an oac control mechanism would be useful:&lt;/dt&gt;&lt;dd&gt;&lt;p&gt;&lt;a class="reference external" href="https://bugs.launchpad.net/os-apply-config/+bug/1258351"&gt;https://bugs.launchpad.net/os-apply-config/+bug/1258351&lt;/a&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/section&gt;
</description><pubDate>Tue, 06 May 2014 00:00:00 </pubDate></item></channel></rss>