Issue Resolution


1. Disk space low - data stopped being ingested


Either:


Under Index management, sort by Index Name ascending, remove the oldest few indices.

Or:


Adjust the ILM policy for the above to a shorter time period to remove some old data

Then: under Dev Tools, Apply the following two config changes:


PUT */_settings
{
  "index": {
    "blocks": {
      "read_only_allow_delete": null    }
  }
}

PUT */_settings
{
  "settings": {
    "index.blocks.write": false
  }
}


2. User access issues


Make sure that the user is a member of the correct AD group.

Make sure that the correct Index Patterns have been created, and that the Role Mapping maps to a Role that has access to indices which match that Index Pattern.


3. Data not arriving in indices as expected


Make sure that Filebeat configmap is prospecting for the correct filename pattern

Make sure that the conditional at the end of the Logstash-Pipeline config map is configured to route the data to a dedicated pipeline for that App

Make sure the dedicated Pipeline for that App is configured to output data to a xxx-prod-write and a xxx-nonprod-write index.

Make sure the ILM policies are properly configured with a write alias that matches an initialized roll-over index with the same -prod and -nonprod aliases matching above.


4. Assign APM Security Permissions to an App AD Group


This is the procedure to enable document level security to the APM feature so that members of various AD groups have access only to their own App APM data.


Create a new role called apm-role-appname

Grant access to apm-* index name

Grant read and read_cross_cluster priviliges

Toggle the "Grant read privileges to specific documents" option to Enabled

Paste the following into the "Grandted documents query" box where appname is the name of the new app being onboarded:

  {
    "match": {
      "service.environment": "appname"
    }
  }


Under Role Mappings, edit the Roles which are assigned to the appname AD group and add the apm-role-appname role to the list.

5. Elasticsearch Pods in crashloop state


This is often caused by SDN issues - restart the Elasticsearch pods one by one. The pods are unable to communicate with one another due to the backoff wait times causing them to start at different times. Restarting them manually allows them to be able to communicate.