-
Data Scientist
- Schlumberger (Houston, TX)
-
Job Description
Build, train, and deploy large-scale, self-supervised "foundation" models that learn rich representations of time series, sequential sensor data in addition to textual and vision data, to be fine-tuned for tasks such as anomaly/event detection, predictive maintenance, forecasting, classification, or multi-modal sensor fusion for industrial and scientific applications.
Data/Signal Processing
• Time Series & Sequential Data: processing, augmentation, feature engineering for financial, industrial, IoT, medical, or other sensor streams (univariate/multivariate time series).
• Sensor Data Analysis: expertise with diverse sensor modalities (e.g., accelerometers, temperature, vibration, audio, images), sampling rates, synchronization, and real-world noise/artifact handling.
• Multi-Modality Learning: integrating heterogeneous data types (time series, images, text, audio, structured) into robust deep learning architectures; cross-modal representation learning.
Machine Learning & Foundation Model Expertise
• Self-supervised and Semi-supervised Learning: time series foundation models, masked modeling, contrastive methods, temporal predictive coding, multimodal alignment and fusion.
• Model Architectures: sequence models (RNNs, GRU/LSTM, TCN), 1D/2D/3D CNNs, Transformers (BERT, ViT, TimeSFormer), graph neural networks, diffusion/generative models, multi-modal/fusion encoders.
• Transfer Learning & Fine-Tuning at Scale: prompt/adapter-based strategies, temporal domain adaptation, few-shot learning for specialized tasks.
• Evaluation Metrics: regression/classification (MSE, F1, AUC), time series similarity (DTW, correlation), event detection/segmentation (IoU, accuracy), business/end-user KPIs.
Software & Infrastructure
• Programming: expert Python (NumPy, SciPy, Pandas), C++/CUDA for custom kernels and high-performance preprocessing.
• Deep Learning Frameworks: PyTorch (Lightning, Distributed), TensorFlow/Keras, JAX/Flax.
• Large-scale Training: multi-GPU, multi-node clusters, mixed-precision, ZeRO optimization, scalable data loaders for long sequences.
• Data Engineering: robust pipelines for ingesting, cleaning, segmenting, and aligning large-scale, time-synchronized multi-sensor datasets.
Mathematical & Algorithmic Foundations
• Linear Algebra, Probability & Statistics, Optimization (stochastic, convex/non-convex, Bayesian).
• Signal Processing: Fourier/wavelet analysis, filters (Kalman, Savitzky–Golay), resampling, noise modeling.
• Numerical Methods: ODE/PDE solvers, inverse problems, regularization, time-frequency methods for complex systems.
Collaboration & Communication
• Cross-disciplinary teamwork with domain experts, engineers, product owners, and end-users from industrial, scientific, or medical backgrounds.
• Clear presentation of complex model behaviors (interpretability, attention analysis), uncertainty quantification, and value impact.
+ MS / Ph.D. in computer science, data science and AI or related fields.
+ 3+ years of relevant experience in data science and AI or related fields.
-