2

2021-10-05 UPDATED QUESTION AND TEXT AFTER MORE ANALYSIS, STRIPPED DOWN TO MINIMAL CASE

Short description

A Nomad / Consul cluster is running, with Traefik (with minimal configuration) as a system task on each Nomad client. There are 3 nomad servers, 3 consul servers, 3 nomad clients and 3 Gluster servers at this point. Set-up very similar to this article set on setting up a Nomad / Consul cluster

Basic images and sites work well.

The issue

I've started porting the first bigger PHP based site (with larger number of page dependency loads on the site) to this cluster and am running into a weird issue that I have pinpointed, but cannot resolve properly.

The tasks load well and registers as up in Consul, Traefik and Nomad. Small pages (with few dependencies) work well.

Whenever a page has too much dependency loads, Apache stalls those specific connections.

When I open a fresh Incognito browser window, and go the the url, the main page and around 10-15 of the dependencies load. The others stay in a pending state in the browser. The browser status keeps 'spinning' (as in loading). Closing the window and opening a new one allows me to repeat the process.

I've nailed down the issue to the fact that the PHP sessions directory is mapped (via Docker) to a directory on a GlusterFS mount.

Moving the volume mapping to a different directory that is host based on the same server removes the issue and the site loads as it should.

Conclusion: The interaction between Docker volumes and the Gluster mount is causing issues under 'heavy load'. With just a few requests everything works well. With lots of requests to access the PHP session file things stall and do not recover.

Question: This is probably caused by either a Gluster configuration issue or the way the mount is configured in /etc/fstab. Please help to fix this issue!

ISOLATION

The PHP sessions directory is set to /var/php_session in the images PHP config and mapped in Nomad / Docker to /data/storage/test/php_sessions.

The /data/storage/test/php_sessions directory is owned by user 20000 to make sure all nodes have access to the same PHP sessions:

client:/data/storage/test$ ls -ln .
drwxr-xr-x  2 20000 20000     6 Oct  5 14:53 php_sessions
drwxr-xr-x  2 20000 20000     6 Oct  5 14:53 upload

When changing the nomad volume mapping (in /etc/nomad/nomad.hcl) from:

client {

host_volume "test-sessions" { path = "/data/storage/test/php_sessions" read_only = false }

}

to

client {

host_volume "test-sessions" { path = "/tmp/php_sessions" read_only = false }

}

(And making sure /tmp/php_sessions is also owned by user 20000)

Everything works again.

Detailed data (More on request)

Contents of /etc/fstab:

LABEL=cloudimg-rootfs   /    ext4   defaults    0 1
LABEL=UEFI  /boot/efi   vfat    defaults    0 1
gluster-01,gluster-02,gluster-03:/storage       /data/storage   glusterfs   _netdev,defaults,direct-io-mode=disable,rw

Dockerfile for site image:

FROM php:7.4.1-apache
ENV APACHE_DOCUMENT_ROOT /var/www/htdocs
WORKDIR /var/www

RUN docker-php-ext-install mysqli pdo_mysql

Make Apache root configurable

RUN sed -ri -e 's!/var/www/html!${APACHE_DOCUMENT_ROOT}!g' /etc/apache2/sites-available/.conf RUN sed -ri -e 's!/var/www/!${APACHE_DOCUMENT_ROOT}!g' /etc/apache2/apache2.conf /etc/apache2/conf-available/.conf

Listen on port 1080 by default for non-root user

RUN sed -ri 's/Listen 80/Listen 1080/g' /etc/apache2/ports.conf RUN sed -ri 's/:80/:1080/g' /etc/apache2/sites-enabled/*

Use own config

COPY data/000-default.conf /etc/apache2/sites-enabled/

Enable Production ini

RUN cp /usr/local/etc/php/php.ini-production /usr/local/etc/php/php.ini

RUN a2enmod rewrite && a2enmod remoteip

COPY --from=composer:latest /usr/bin/composer /usr/local/bin/composer COPY --chown=www-data:www-data . /var/www

RUN /usr/local/bin/composer --no-cache --no-ansi --no-interaction install

Finally add security changes

COPY data/changes.ini /usr/local/etc/php/conf.d/

The nomad file stripped down to what triggers the issue: with the following Nomad job plan:

job "test" {
  datacenters = ["dc1"]

group "test-staging" { count = 1

network {
  port "php_http" {
    to = 1080
  }
}

volume "test-sessions" {
  type      = "host"
  read_only = false
  source    = "test-sessions"
}

volume "test-upload" {
  type      = "host"
  read_only = false
  source    = "test-upload"
}

service {
  name = "test-staging"
  port = "php_http"

  tags = [
    "traefik.enable=true",
    "traefik.http.routers.test.php_staging.rule=Host(`staging.xxxxxx.com`)",
  ]

  check {
    type     = "tcp"
    port     = "php_http"
    interval = "5s"
    timeout  = "2s"
  }
}

task "test" {
  driver = "docker"
  user = "20000"

  config {
    image = "docker-repo:5000/test/test:latest"
    ports = ["php_http"]
  }

  volume_mount {
    volume      = "test-sessions"
    destination = "/var/php_sessions"
    read_only   = false
  }

  volume_mount {
    volume      = "test-upload"
    destination = "/var/upload"
    read_only   = false
  }

  template {
    data = <<EOF

1.2.3.4 EOF

    destination = "local/trusted-proxies.lst"
  }
}

} }

Paul
  • 51

0 Answers0