Roles and Responsibilities:
- Maintain and monitor Splunk infrastructure (Search Heads, Indexers, Forwarders, Deployment Server, Cluster Master, etc.).
- Ensure uptime and system health via monitoring, tuning, and log analysis (including introspection, metrics logs).
- Manage indexing performance and storage usage: data retention, index lifecycle, bucket management.
- Generate and check reports from the system to ensure the system and agents are working as intended
- Perform checks and troubleshoot if necessary, to ensure that the Splunk forwarders (agents) are working and can pipe logs back to Splunk systems.
- Perform checks and troubleshoot if necessary, to ensure the Splunk systems can receive logs from sources such as cloudwatch or syslogs servers.
- Integrate Splunk with the Authority's systems and processes to perform real-time monitoring and alert when Splunk infrastructure is not working well, so that issues can be attended to early. (eg. log breaks, disconnected agents, search-head hung from insufficient resources, etc)
- Fine tune Splunk rules according to the Authority’s request.
- Perform parser validation or write new custom parser according to the Authority’s request
- Work closely with the Authority’s SOC to ensure Splunk supports threat detection, auditing, and incident response use cases.
- Change the passwords for all privilege and services accounts for the Splunk systems regularly
- Ensure the Splunk systems is working as intended during the Authority’s periodic BCP and DR exercises.
- Investigate problems and provide assistance to triage issues.
- Correct defects in the System, including temporary corrections or workarounds until permanent fixes or updates are available.
- Prepare incident report including the root cause analysis and necessary resolution.
- Track and report issues, support cases and incident resolutions on a weekly basis.
- Deploy and test system changes in the Non-Production environments when required.
- Demonstrate that System functionality and performance are not degraded.
- Implement the system changes into the Production environment upon the Authority’s acceptance of the testing results.
- Implementation of additional use cases, report design and development and tuning to reduce false positives and negatives.
- Monitor Security advisory, new releases, notifications and maintenance expiry dates for all Software used in the System and assess the impact, if any.
- Recommend to the Authority the best course of action to take and provide all relevant documentation.
- If the issue arises from a security vulnerability or software incompatibility, the RE shall evaluate and implement fixes to address the vulnerability or incompatibility.
- Check and remediate findings from the Authority’s periodic vulnerability and compliance scans.
- Track and update the Authority on the DLP End of Life (EOL) and End of Support (EOS) and plans to maintain product supportability.
Report job