- Own and improve our apps running on Kubernetes platform:
Build and maintain Helm charts for our services
Standardize deployment patterns, configuration, and release strategies
Ensure reliability, scalability, and security of workloads running on k8s
- Build and evolve CI/CD pipelines in GitLab:
Design flexible pipelines for build, test, security scanning, and deploymentIntroduce best practices (versioning, feature flags, rollbacks, approvals, etc.)
Continuously improve speed, reliability, and developer experience
- Set up and improve observability:
Use Elasticsearch , Prometheus (and related tooling) to create useful dashboards & alerts
Help teams understand logs, metrics, and traces — turn data into decisions
Build early-warning detection frameworks and validation/simulation practices to surface failure modes before production
- Research & experiment:
Investigate new tools, patterns, and AWS services; compare alternatives
POC and validate ideas before rolling them out to production
Bring proposals to the team, explain trade-offs and lead implementation
Apply new learning specifically to solve harder reliability/scalability/automation problems and show progress through delivered outcomes.
- Document and share knowledge:
Write clear runbooks, architecture/infra documentation, and “how-to” guides
Make sure others can understand, reproduce, and maintain what you build
Prevent long-term system “decay” by simplifying complex setups and institutionalizing sustainable best practices.
- Collaborate across teams:
Work closely with developers, security, product and operations
Participate in incident response, root cause analysis, and prevention work
Align infrastructure decisions with product goals, cost efficiency, and deployment impact; proactively flag technical risks affecting availability/performance