- Insight Global (Jacksonville, FL)
- …operations, or command center roles. * Hands-on with monitoring and observability tools (Datadog, Dynatrace, Splunk, ServiceNow, etc.). * Strong scripting/automation ... skills (Python, PowerShell, Bash). * Knowledge of incident management frameworks (ITIL a plus). Problem-solver with strong communication skills, able to work in high-pressure environments n/a more
- Insight Global (Orange, CA)
- …platform reliability, security, and compliance (HIPAA, PHI). * Implement observability , testing, and governance for LLM-based applications. * Maintain infrastructure ... as code (IaC) and CI/CD pipelines. * Integrate open-source and third-party tools (LangChain, Weaviate, Pinecone, Azure OpenAI, Vertex AI). Eligible States for Remote Work: AZ, FL, CA, GA, KS, ID, MO, NV, NC, TN, TX, WY We are a company committed to creating… more
- Expedient (Cleveland, OH)
- …auto-scaling to meet performance targets + Monitor & Optimize: Implement comprehensive observability using Elastic Stack; continuously tune for efficiency + Secure & ... Protect: Ensure infrastructure meets strict security and compliance standards + Collaborate & Support: Partner with developers and engineers; translate complex infrastructure into simple terms + Troubleshoot & Improve: Rapidly resolve issues and drive… more
- NVIDIA (Santa Clara, CA)
- …cloud platforms (AWS/GCP/Azure), infrastructure as code, CI/CD, and production observability . + Contributions to open-source projects and/or publications; please ... include links to GitHub pull requests, published papers and artifacts. At NVIDIA, we believe artificial intelligence (AI) will fundamentally transform how people live and work. Our mission is to advance AI research and development to create groundbreaking… more
- Microsoft Corporation (Redmond, WA)
- …work closely with product teams to enhance availability, reliability, observability , and operability across our planet-scale systems. We prioritize long-term ... platform improvements through engineering over repetitive manual tasks while having data-driven approach to make investment decisions. Increasingly, we leverage AI to amplify our ability to scale reliability across Azure. Our teams contribute to product… more
- Oracle (Indianapolis, IN)
- …use cases such as **application search** , **log analytics** , and ** observability pipelines** . + Deep understanding of **distributed systems architecture** , ... including experience building and maintaining **high-throughput, highly available services** at scale. + Proficient in **high-level programming languages** , particularly **Java and Python** , with a strong emphasis on clean, maintainable, and testable code. +… more
- Oracle (Trenton, NJ)
- …media tools). + Ensure services are built for scale, availability, observability , performance, and security, optimized for graphics and rendering pipelines. + ... Collaborate with distributed engineering teams to deliver cloud-native solutions for media production workflows. + Drive operational excellence for GPU-powered services, including performance monitoring, failure analysis, and workload optimization. + Stay… more
- Oracle (Nashville, TN)
- …learn. **Responsibilities** + Cloud service design for availability, scalability, observability , and testability. + Implementation, validation and documentation of ... services and their component micro-services. + Stay abreast of emerging technologies, industry best practices, ensuring compliance and driving innovation within the organization. + Work collaboratively to realize and achieve the technical vision of the team. +… more
- New York Times (New York, NY)
- …caching, and multi-region operations + Experience troubleshooting and improving observability for a platform distributed across multiple systems + Experience ... deploying and maintaining applications on Kubernetes This is a hybrid role. #LI-Hybrid REQ- 018720 The annual base pay range for this role is between: $140,000 - $160,000 USD The New York Times Company is committed to being the world's best source of… more
- Walmart (Bentonville, AR)
- …and Enterprise products + Working knowledge on any of the Observability tools and enterprise monitoring solutions like Dynatrace, AppDynamics, New Relic, ... Prometheus etc. + Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance. + Secure the system from issues, be they real, perceived, or notional. **What you'll bring:** +… more
Recent Jobs
-
Python Risk Model Developer (SQL & Unix)
- Capgemini (New York, NY)
-
Manager of Software Engineering, Payroll Processing
- UKG (Lowell, MA)
-
Beverage Program Marketing & Activations Manager
- IMI Agency (Washington, DC)
-
Senior Software Developer
- Oracle (Harrisburg, PA)