Site Reliability Engineer
1 week ago
We are seeking a highly skilled Site Reliability Engineer - AWS to enhance the reliability, scalability, and security of our cloud infrastructure. The ideal candidate will be responsible for designing, implementing, and maintaining high-availability systems, automating processes, and ensuring seamless operations on AWS. This role requires expertise in DevOps, cloud automation, monitoring, and incident response.Title : Site Reliability Engineer - AWSLocation : Remote WorkEmployment Type: Full-timeWork timings : 24*7 rotational shiftsResponsibilities:- Design and maintain highly available, scalable, and fault-tolerant AWS infrastructure to ensure system reliability and performance. - Proactively monitor and troubleshoot system issues, minimizing downtime and optimizing system performance. - Develop and maintain Infrastructure as Code (IaC) using Terraform, CloudFormation, or AWS CDK to automate deployments and infrastructure management. - Implement and optimize continuous integration and deployment (CI/CD) pipelines using tools like Jenkins, GitLab CI/CD, or AWS CodePipeline. - Ensure AWS environments meet security best practices, including IAM policies, network security configurations, and compliance requirements. - Set up and manage monitoring and logging solutions using tools such as Prometheus, AWS CloudWatch, ELK Stack, and Datadog. - Identify and address performance bottlenecks through load balancing, caching strategies, and system optimizations. - Work closely with developers, security teams, and product managers to enhance system architecture and operational efficiency.Required Skills & Experience- Strong experience in AWS services such as EC2, Lambda, EKS, S3, SageMaker, DynamoDB, and IAM. - Expertise in Infrastructure as Code (IaC) tools like Terraform or CloudFormation. - Proficiency in CI/CD pipelines using GitHub Actions, Jenkins, or AWS CodePipeline. - Experience with containerization and orchestration (Docker, Kubernetes, Helm). - Strong knowledge of monitoring, logging, and alerting tools (CloudWatch, Prometheus, ELK, Datadog). - Solid Python, Bash, or Golang scripting skills for automation. - Experience working with ML models in production environments is a plus. - Familiarity with security best practices (IAM, VPC security, encryption, WAF). - Strong problem-solving and troubleshooting skills.Preferred Qualifications- Experience with MLOps frameworks and AI model deployment. - Knowledge of AWS AI/ML services like SageMaker, Bedrock, or AI pipelines. - Hands-on experience with Kafka, Spark, or other big data technologies.About Techolution :Techolution is a next gen Consulting firm on track to become one of the most admired brands in the world for "innovation done right". Our purpose is to harness our expertise in novel technologies to deliver more profits for our enterprise clients while helping them deliver a better human experience for the communities they serve.With that, we are now fully committed to helping our clients build the enterprise of tomorrow by making the leap from Lab Grade AI to Real World AI. Other focus areas being Enterprise Cloud, Product Innovation (IoT, 3D printing, Robotics), Real World AI Services (CV, LLM, CNN).We are honored to have recently received the prestigious Inc 500 Best In Business award, a testament to our commitment to excellence. We were also awarded - AI Solution Provider of the Year by The AI Summit 2023, Platinum sponsor at Advantage DoD 2024 Symposium and a lot more exciting stuff While we are big enough to be trusted by some of the greatest brands in the world, we are small enough to care about delivering meaningful ROI-generating innovation at a guaranteed price for each client that we serve.Our thought leader, Luv Tulsidas, wrote and published a book in collaboration with Forbes, “Failing Fast? Secrets to succeed fast with AI”. Refer here for more details on the content - https://www.luvtulsidas.com/Let's explore furtherUncover our unique AI accelerators with us:1. Enterprise LLM Studio: Our no-code DIY AI studio for enterprises. Choose an LLM, connect it to your data, and create an expert-level agent in 20 minutes.2. AppMod. AI: Modernizes ancient tech stacks quickly, achieving over 80% autonomy for major brands3. ComputerVision. AI: Our ComputerVision. AI Offers customizable Computer Vision and Audio AI models, plus DIY tools and a Real-Time Co-Pilot for human-AI collaboration4. Robotics and Edge Device Fabrication: Provides comprehensive robotics, hardware fabrication, and AI-integrated edge design services.5. RLEF AI Platform: Our proven Reinforcement Learning with Expert Feedback (RLEF) approach bridges Lab-Grade AI to Real-World AI.6. AI Center of Excellence: Establishes an AI Center of Excellence to maximize AI potential and ROI.7. FaceOpen: AI-powered user identification system using image recognition and deep neural networks, eliminating the need for keys, badges, or fingerprint scannersSome videos you wanna watch- Computer Vision demo at The AI Summit New York 2023 - Life at Techolution - GoogleNext 2023 - Ai4 - Artificial Intelligence Conferences 2023 - WaWa - Solving Food Wastage - Saving lives - Brooklyn Hospital - Innovation Done Right on Google Cloud - Techolution featured on Worldwide Business with KathyIreland - Techolution presented by ION World’s GreatestVisit us @www.techolution.com : To know more about our revolutionary core practices and getting to know in detail about how we enrich the human experience with technology.
-
Site Reliability Engineer
2 weeks ago
New Delhi, India Tata Consultancy Services Full timeRole: Site Reliability Engineer Experience: 4 to 7 Years Locations: Chennai/Pune/Kolkata
-
Site Reliability Engineer
1 week ago
New Delhi, India Datum Technologies Group Full timeJob Title: Site Reliability Engineer (SRE) – Azure & AIExperience: 7+ yearsWork Mode: HybridWork Location: Chennai/Mumbai/GurgaonJob Summary:We are looking for an experienced Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure, AI infrastructure, and automation. The ideal candidate will have a solid background in managing cloud...
-
Site Reliability Engineer
3 days ago
New Delhi, India Datum Technologies Group Full timeJob Title: Site Reliability Engineer (SRE) – AWSExperience: 8+ years Location: Chennai / Mumbai Work Mode: HybridKey Skills:AWS, Terraform, Kubernetes, Docker, Grafana, Prometheus, DatadogJob Summary:We are looking for a skilled Site Reliability Engineer (SRE) with strong AWS experience and a solid background in DevOps, automation, observability, and...
-
Site Reliability Engineer
1 week ago
New Delhi, India Grootan Technologies Full timeAbout the RoleWe are seeking a skilled Site Reliability Engineer (SRE) with 4–5 years of hands-on experience to join our engineering team. In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications. You will leverage your expertise in automation, cloud platforms, and...
-
Site Reliability Engineer
3 days ago
New Delhi, India Elios Talent Full timeSite Reliability EngineerKey Highlights️ Build, automate, and support cloud-native infrastructure powering high-availability platforms⚡ Contribute to automation-first engineering across AWS, Terraform, CI/CD, and observability toolingImprove reliability, uptime, system health, and performance across production environmentsStrengthen DevSecOps...
-
Site Reliability Engineer
5 days ago
New Delhi, India Tata Consultancy Services Full timeRole: Site Reliability Engineer Location: Chennai/Bangalore/HyderabadExp- 5-11 years 1.Exposure to any APM tool like Dynatrace, Appdynamics, Splunk, etc 2.DBA or Infra admin 3.Gremlin or Chaos Monkey or Simian Army or Litmus expertise 4.Exposure to ITSM tools like Service Now, etc 5.Understanding of Automation and Chaos Engineering 6.Exposure to Devops tools...
-
Site Reliability Engineer
5 days ago
New Delhi, India VXI Global Solutions Full timeWe are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience withPrometheus ,Grafana , along with exposure toSolarWinds . You should be comfortable working withmetrics, logs, and...
-
Site Reliability Engineer
3 days ago
New Delhi, India VXI Global Solutions Full timeWe are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, along with exposure to SolarWinds. You should be comfortable working with metrics, logs, and...
-
Site Reliability Engineer
1 day ago
New Delhi, India Andor Tech Full timeHiring!! About AndorTech AndorTech is aglobal IT services and consulting firmfounded in 2009, headquartered in Bangalore. The company specializes insoftware engineering, AI-enabled IT services, application support, analytics, and test automation. With a presence across India, the USA, Europe, and the UAE, AndorTech partners withGlobal Capability Centers...
-
Site Reliability Engineer
3 days ago
New Delhi, India Andor Tech Full timeHiring!!About AndorTech AndorTech is aglobal IT services and consulting firmfounded in 2009, headquartered in Bangalore. The company specializes insoftware engineering, AI-enabled IT services, application support, analytics, and test automation . With a presence across India, the USA, Europe, and the UAE, AndorTech partners withGlobal Capability Centers...