Reinforcement Learning and Optimal Control for IRRBB Hedging Under Uncertainty¶
This notebook builds an example of IRRBB (NII) hedging using term structure models, statistical time series, control theory, and reinforcement learning:
- Yield curve data (GSW): download + clean a panel of zero-coupon yields.
- Term structure model:
- Start with Diebold–Li (DNS) factors estimated by cross-sectional regression (OLS).
- Put the model into state-space form and apply Kalman filtering/smoothing.
- Extend to AFNS (Arbitrage-Free Nelson–Siegel) by adding the no-arbitrage yield adjustment.
- Banking-book NII model: build a simplified balance sheet and compute NII using representative asset/liability repricing rates plus a hedge instrument.
- Dynamic hedging:
- Classical control (LQ) under quadratic objectives (risk vs trading/inventory penalties).
- Reinforcement Learning (SAC), first under the same quadratic setting, then under L1 transaction costs where LQ is no longer optimal.
- Stress testing & comparison: evaluate Unhedged vs LQ vs RL across baseline and stress scenarios with tables and plots.
The goal is a controlled comparison: show where classical methods dominate (linear–quadratic world) and where RL becomes valuable (realistic frictions such as L1 costs).
1) Yield curve dataset (GSW) and preprocessing¶
We use the Gurkaynak–Sack–Wright (GSW) U.S. Treasury zero-coupon curve because it is a clean, widely used academic dataset with a long history and many maturities. We use monthly frequency for this exercise.
In the next cells we:
- download (or load cached) GSW yields,
- convert columns/maturities,
- store both daily and monthly versions to keep the ETL step reproducible.
from pathlib import Path
import pandas as pd
import requests
from io import StringIO
import matplotlib.pyplot as plt
import numpy as np
DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)
RAW_CSV_PATH = DATA_DIR / "feds200628.csv" # local cached file
DAILY_ZC_CSV_PATH = DATA_DIR / "gsw_zero_coupon_daily.csv"
MONTHLY_ZC_CSV_PATH = DATA_DIR / "gsw_zero_coupon_monthly.csv"
GSW_URL = "https://www.federalreserve.gov/data/yield-curve-tables/feds200628.csv"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/127.0.0.1 Safari/537.36"
}
if not RAW_CSV_PATH.exists():
print("Local GSW file not found. Downloading from Fed...")
resp = requests.get(GSW_URL, headers=headers)
resp.raise_for_status()
# Some versions require skipping first 9 rows, but we write the raw text first
with open(RAW_CSV_PATH, "w", encoding="utf-8") as f:
f.write(resp.text)
print(f"Saved raw GSW CSV to {RAW_CSV_PATH.resolve()}")
else:
print("Local GSW file already exists. Skipping download.")
Local GSW file already exists. Skipping download.
print("Loading local GSW CSV...")
df_raw = pd.read_csv(RAW_CSV_PATH, skiprows=9)
print("Raw shape:", df_raw.shape)
df_raw.head()
Loading local GSW CSV... Raw shape: (16853, 100)
| Date | BETA0 | BETA1 | BETA2 | BETA3 | SVEN1F01 | SVEN1F04 | SVEN1F09 | SVENF01 | SVENF02 | ... | SVENY23 | SVENY24 | SVENY25 | SVENY26 | SVENY27 | SVENY28 | SVENY29 | SVENY30 | TAU1 | TAU2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1961-06-14 | 3.917606 | -1.277955 | -1.949397 | 0.0 | 3.8067 | 3.9562 | NaN | 3.5492 | 3.8825 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.339218 | -999.99 |
| 1 | 1961-06-15 | 3.978498 | -1.257404 | -2.247617 | 0.0 | 3.8694 | 4.0183 | NaN | 3.5997 | 3.9460 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.325775 | -999.99 |
| 2 | 1961-06-16 | 3.984350 | -1.429538 | -1.885024 | 0.0 | 3.8634 | 4.0242 | NaN | 3.5957 | 3.9448 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.348817 | -999.99 |
| 3 | 1961-06-19 | 4.004379 | -0.723311 | -3.310743 | 0.0 | 3.9196 | 4.0447 | NaN | 3.6447 | 3.9842 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.282087 | -999.99 |
| 4 | 1961-06-20 | 3.985789 | -0.900432 | -2.844809 | 0.0 | 3.8732 | 4.0257 | NaN | 3.5845 | 3.9552 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.310316 | -999.99 |
5 rows × 100 columns
df_raw.columns
Index(['Date', 'BETA0', 'BETA1', 'BETA2', 'BETA3', 'SVEN1F01', 'SVEN1F04',
'SVEN1F09', 'SVENF01', 'SVENF02', 'SVENF03', 'SVENF04', 'SVENF05',
'SVENF06', 'SVENF07', 'SVENF08', 'SVENF09', 'SVENF10', 'SVENF11',
'SVENF12', 'SVENF13', 'SVENF14', 'SVENF15', 'SVENF16', 'SVENF17',
'SVENF18', 'SVENF19', 'SVENF20', 'SVENF21', 'SVENF22', 'SVENF23',
'SVENF24', 'SVENF25', 'SVENF26', 'SVENF27', 'SVENF28', 'SVENF29',
'SVENF30', 'SVENPY01', 'SVENPY02', 'SVENPY03', 'SVENPY04', 'SVENPY05',
'SVENPY06', 'SVENPY07', 'SVENPY08', 'SVENPY09', 'SVENPY10', 'SVENPY11',
'SVENPY12', 'SVENPY13', 'SVENPY14', 'SVENPY15', 'SVENPY16', 'SVENPY17',
'SVENPY18', 'SVENPY19', 'SVENPY20', 'SVENPY21', 'SVENPY22', 'SVENPY23',
'SVENPY24', 'SVENPY25', 'SVENPY26', 'SVENPY27', 'SVENPY28', 'SVENPY29',
'SVENPY30', 'SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05',
'SVENY06', 'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11',
'SVENY12', 'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17',
'SVENY18', 'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23',
'SVENY24', 'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29',
'SVENY30', 'TAU1', 'TAU2'],
dtype='object')
# 1. Normalize column names (just in case)
df = df_raw.copy()
# Make sure the date column is correctly named
date_col_candidates = ["Date", "date", "DATE"]
date_col = None
for c in date_col_candidates:
if c in df.columns:
date_col = c
break
if date_col is None:
raise ValueError(f"Could not find a date column in raw data. Columns: {df.columns}")
df[date_col] = pd.to_datetime(df[date_col])
df = df.sort_values(by=date_col)
# 2. Select zero-coupon columns (SVENYxx)
zc_cols = [c for c in df.columns if c.startswith("SVENY")]
print("Zero-coupon columns:", zc_cols)
if not zc_cols:
raise ValueError("No zero-coupon (SVENYxx) columns found. Check raw CSV format.")
# 3. Keep Date + zero-coupon columns
df_zc = df[[date_col] + zc_cols].copy()
df_zc = df_zc.rename(columns={date_col: "date"})
df_zc.set_index("date", inplace=True)
df_zc.sort_index(inplace=True)
print("Zero-coupon daily data (raw units):")
df_zc.head()
Zero-coupon columns: ['SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05', 'SVENY06', 'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11', 'SVENY12', 'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17', 'SVENY18', 'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23', 'SVENY24', 'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29', 'SVENY30'] Zero-coupon daily data (raw units):
| SVENY01 | SVENY02 | SVENY03 | SVENY04 | SVENY05 | SVENY06 | SVENY07 | SVENY08 | SVENY09 | SVENY10 | ... | SVENY21 | SVENY22 | SVENY23 | SVENY24 | SVENY25 | SVENY26 | SVENY27 | SVENY28 | SVENY29 | SVENY30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||||||||||||
| 1961-06-14 | 2.9825 | 3.3771 | 3.5530 | 3.6439 | 3.6987 | 3.7351 | 3.7612 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-06-15 | 2.9941 | 3.4137 | 3.5981 | 3.6930 | 3.7501 | 3.7882 | 3.8154 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-06-16 | 3.0012 | 3.4142 | 3.5994 | 3.6953 | 3.7531 | 3.7917 | 3.8192 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-06-19 | 2.9949 | 3.4386 | 3.6252 | 3.7199 | 3.7768 | 3.8147 | 3.8418 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-06-20 | 2.9833 | 3.4101 | 3.5986 | 3.6952 | 3.7533 | 3.7921 | 3.8198 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 30 columns
# The SVENYxx convention:
# SVENY01 -> 1-year zero-coupon, SVENY02 -> 2-year, ..., typically up to 30.
# We'll map 'SVENY01' -> 1.0, 'SVENY02' -> 2.0, etc., and rename columns to "1.0","2.0",...
maturities_years = []
for c in zc_cols:
# x = last two chars -> '01', '02', etc.
# Some files may have 3 digits if > 99, but for Treasuries we expect <= 30.
suffix = c.replace("SVENY", "")
try:
mat = int(suffix)
except ValueError:
raise ValueError(f"Unexpected SVENY column name format: {c}")
maturities_years.append(mat)
# New column names as string years, e.g. "1.0", "2.0", "3.0", ...
new_cols = [f"{mat:.1f}" for mat in maturities_years]
zc_renaming = dict(zip(zc_cols, new_cols))
df_zc = df_zc.rename(columns=zc_renaming)
# Convert from percent to decimals
df_zc = df_zc.astype(float) / 100.0
print("Zero-coupon daily yields in decimals:")
df_zc.head()
Zero-coupon daily yields in decimals:
| 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 7.0 | 8.0 | 9.0 | 10.0 | ... | 21.0 | 22.0 | 23.0 | 24.0 | 25.0 | 26.0 | 27.0 | 28.0 | 29.0 | 30.0 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||||||||||||
| 1961-06-14 | 0.029825 | 0.033771 | 0.035530 | 0.036439 | 0.036987 | 0.037351 | 0.037612 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-06-15 | 0.029941 | 0.034137 | 0.035981 | 0.036930 | 0.037501 | 0.037882 | 0.038154 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-06-16 | 0.030012 | 0.034142 | 0.035994 | 0.036953 | 0.037531 | 0.037917 | 0.038192 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-06-19 | 0.029949 | 0.034386 | 0.036252 | 0.037199 | 0.037768 | 0.038147 | 0.038418 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-06-20 | 0.029833 | 0.034101 | 0.035986 | 0.036952 | 0.037533 | 0.037921 | 0.038198 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 30 columns
df_zc.to_csv(DAILY_ZC_CSV_PATH, index=True)
# Resample to monthly (end-of-month yields)
df_zc_monthly = df_zc.resample("ME").last()
# Forward-fill any gaps (holidays etc.)
df_zc_monthly = df_zc_monthly.ffill()
# Drop rows that are completely NaN (if any)
df_zc_monthly = df_zc_monthly.dropna(how="all")
print("Monthly zero-coupon yields (decimals):")
df_zc_monthly.head()
Monthly zero-coupon yields (decimals):
| 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 7.0 | 8.0 | 9.0 | 10.0 | ... | 21.0 | 22.0 | 23.0 | 24.0 | 25.0 | 26.0 | 27.0 | 28.0 | 29.0 | 30.0 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||||||||||||
| 1961-06-30 | 0.029011 | 0.032795 | 0.035036 | 0.036316 | 0.037109 | 0.037640 | 0.038020 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-07-31 | 0.027780 | 0.032304 | 0.035068 | 0.036787 | 0.037907 | 0.038678 | 0.039234 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-08-31 | 0.029863 | 0.033990 | 0.036481 | 0.037919 | 0.038812 | 0.039412 | 0.039841 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-09-30 | 0.029358 | 0.033250 | 0.035412 | 0.036661 | 0.037442 | 0.037968 | 0.038345 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1961-10-31 | 0.028936 | 0.032396 | 0.034616 | 0.036087 | 0.037096 | 0.037813 | 0.038339 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 30 columns
df_zc_monthly.to_csv(MONTHLY_ZC_CSV_PATH, index=True)
print(f"Saved monthly zero-coupon panel to: {MONTHLY_ZC_CSV_PATH.resolve()}")
Saved monthly zero-coupon panel to: C:\Users\thoma\Desktop\portfolio projects\P3 - optimal NII hedging\data\gsw_zero_coupon_monthly.csv
# Plot a few maturities to check series look reasonable
sample_mats = ["1.0", "5.0", "10.0", "30.0"]
sample_mats = [m for m in sample_mats if m in df_zc_monthly.columns]
df_zc_monthly[sample_mats].plot(figsize=(10, 5))
plt.title("GSW Zero-Coupon Yields (Monthly)")
plt.xlabel("Date")
plt.ylabel("Yield (decimal)")
plt.grid(True)
plt.show()
2) DNS (Diebold–Li) factor model: cross-sectional OLS¶
We start with the standard dynamic Nelson–Siegel representation of the yield curve as a function of time to maturity $\tau$:
$ y_t(\tau) = \beta_{1,t}
- \beta_{2,t}\left(\frac{1-e^{-\lambda \tau}}{\lambda \tau}\right)
- \beta_{3,t}\left(\frac{1-e^{-\lambda \tau}}{\lambda \tau}-e^{-\lambda \tau}\right)
- \varepsilon_{t} $
Parameters:
- $\beta_{1,t}$: level
- $\beta_{2,t}$: slope
- $\beta_{3,t}$: curvature
- $\lambda$: controls the maturity where curvature loads most strongly
At each date $t$, the factors can be estimated by OLS across maturities. This gives a fast, transparent baseline estimate of the latent curve factors.
from statsmodels.tsa.api import VAR
DATA_DIR = Path("data")
df_yields = pd.read_csv(DATA_DIR / "gsw_zero_coupon_monthly.csv",
index_col=0, parse_dates=True)
# only get data from 1990 onwards
df_yields = df_yields["1990-01-01":]
print(df_yields.shape)
df_yields.head()
(433, 30)
| 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 7.0 | 8.0 | 9.0 | 10.0 | ... | 21.0 | 22.0 | 23.0 | 24.0 | 25.0 | 26.0 | 27.0 | 28.0 | 29.0 | 30.0 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||||||||||||
| 1990-01-31 | 0.080998 | 0.081567 | 0.082178 | 0.082620 | 0.082924 | 0.083137 | 0.083292 | 0.083409 | 0.083500 | 0.083573 | ... | 0.083916 | 0.083930 | 0.083943 | 0.083955 | 0.083966 | 0.083976 | 0.083985 | 0.083994 | 0.084002 | 0.084010 |
| 1990-02-28 | 0.080925 | 0.082517 | 0.083358 | 0.083810 | 0.084082 | 0.084263 | 0.084392 | 0.084489 | 0.084564 | 0.084624 | ... | 0.084907 | 0.084918 | 0.084929 | 0.084939 | 0.084948 | 0.084956 | 0.084964 | 0.084971 | 0.084977 | 0.084984 |
| 1990-03-31 | 0.083192 | 0.084778 | 0.085420 | 0.085639 | 0.085698 | 0.085701 | 0.085687 | 0.085670 | 0.085654 | 0.085640 | ... | 0.085570 | 0.085567 | 0.085565 | 0.085562 | 0.085560 | 0.085558 | 0.085556 | 0.085554 | 0.085553 | 0.085551 |
| 1990-04-30 | 0.085684 | 0.087799 | 0.088629 | 0.088956 | 0.089093 | 0.089156 | 0.089190 | 0.089210 | 0.089225 | 0.089235 | ... | 0.089284 | 0.089286 | 0.089288 | 0.089290 | 0.089291 | 0.089293 | 0.089294 | 0.089295 | 0.089296 | 0.089297 |
| 1990-05-31 | 0.081363 | 0.083009 | 0.084012 | 0.084568 | 0.084881 | 0.085058 | 0.085157 | 0.085208 | 0.085232 | 0.085239 | ... | 0.085137 | 0.085130 | 0.085123 | 0.085117 | 0.085111 | 0.085106 | 0.085101 | 0.085096 | 0.085092 | 0.085088 |
5 rows × 30 columns
def dl_loadings(maturities: np.ndarray, lam: float) -> np.ndarray:
tau = maturities
lam_tau = lam * tau
with np.errstate(divide="ignore", invalid="ignore"):
f1 = np.ones_like(tau)
f2 = (1 - np.exp(-lam_tau)) / lam_tau
f3 = f2 - np.exp(-lam_tau)
f2 = np.where(tau == 0, 1.0, f2)
f3 = np.where(tau == 0, 0.0, f3)
return np.column_stack([f1, f2, f3])
def estimate_diebold_li_factors(df_yields: pd.DataFrame, lam: float = 0.0609):
maturities = np.array([float(c) for c in df_yields.columns])
sort_idx = np.argsort(maturities)
mats_sorted = maturities[sort_idx]
df_sorted = df_yields.iloc[:, sort_idx]
X = dl_loadings(mats_sorted, lam)
betas = []
dates = []
for date, row in df_sorted.iterrows():
y = row.values.astype(float)
mask = ~np.isnan(y)
X_m = X[mask]
y_m = y[mask]
if y_m.shape[0] < 3:
continue
beta_hat = np.linalg.inv(X_m.T @ X_m) @ X_m.T @ y_m
betas.append(beta_hat)
dates.append(date)
return pd.DataFrame(betas, index=dates, columns=["level","slope","curvature"]).sort_index()
lam = 0.0609 # canonical Diebold–Li value
factors_ols = estimate_diebold_li_factors(df_yields, lam)
factors_ols.to_csv(DATA_DIR / "dl_factors_ols.csv")
print(factors_ols.head())
level slope curvature 1990-01-31 0.076733 0.004404 0.016872 1990-02-28 0.075251 0.006550 0.021459 1990-03-31 0.079159 0.005193 0.012616 1990-04-30 0.080233 0.006865 0.018714 1990-05-31 0.072812 0.009628 0.024918
3) Time-series dynamics for the factors¶
In generic fashion, a state-space model consists of two equations:
State (Transition) Equation¶
$ \mathbf{x}_{t+1} = f(\mathbf{x}_t) + \boldsymbol{\varepsilon}_{t+1}, \qquad \boldsymbol{\varepsilon}_{t+1} \sim \mathcal{N}(0, Q) $
Measurement (Observation) Equation¶
$ \mathbf{y}_t = g(\mathbf{x}_t) + \boldsymbol{\eta}_t, \qquad \boldsymbol{\eta}_t \sim \mathcal{N}(0, R) $
To move from “static cross-sectional fits” to a full state-space model, we first need a law of motion for the factors i.e. a transition equation:
$ B_{t+1} = A B_{t} (\tau) + \eta_t, \quad \eta_t \sim \mathcal{N}(0, Q), $
where $B_t = [l_t, s_t, c_t] = [\beta_{1,t}, \beta_{2,t}, \beta_{3,t}]$ correspond to our DNS factors (level, slope and curvature).
A VAR(1) is a natural first choice for the dynamics:
- flexible enough to capture persistence and cross-factor interactions,
- still linear-Gaussian (useful for Kalman filtering),
- aligns with a control-theory / LQ framework later.
The fitted VAR parameters $(A, Q)$ become the state transition in the Kalman filter. We use the VAR module from the statmodels library.
var_model = VAR(factors_ols)
var_res = var_model.fit(maxlags=1)
A = var_res.coefs[0] # transition matrix
Q = var_res.sigma_u # state noise covariance
print("Transition matrix A:")
print(A)
print("State noise covariance Q:")
print(Q)
Transition matrix A:
[[ 0.91940097 0.08694316 -0.04380703]
[ 0.07225808 0.88825823 0.04739625]
[ 0.1174264 -0.12157507 1.05306669]]
State noise covariance Q:
level slope curvature
level 0.000129 -0.000121 -0.000251
slope -0.000121 0.000121 0.000233
curvature -0.000251 0.000233 0.000525
C:\Users\thoma\.conda\envs\pymc_env\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency ME will be used. self._init_dates(dates, freq)
4) Measurement equation and residual covariance¶
In the measurement equation, yields are linear in the factors (given $\lambda$):
$ Y_t = y_t (\tau) = H(\lambda, \tau) B_t + \varepsilon_t,\quad \varepsilon_t \sim \mathcal{N}(0, R), $
where $H$ contains the factor loadings.
Key practical step: estimate $R$ (measurement noise) using OLS residuals and ensure that:
- maturities are aligned across dates,
- missing values are handled consistently,
- the clean yield matrix $Y$ and factor estimates are dimensionally compatible.
This makes the later state-space computations robust and reproducible.
def estimate_measurement_cov(df_yields, factors_ols, lam):
# Align by date
df_y, df_b = df_yields.align(factors_ols, join="inner", axis=0)
# Coerce to numeric and drop any rows with NaNs
df_y = df_y.apply(pd.to_numeric, errors="coerce")
df_b = df_b.apply(pd.to_numeric, errors="coerce")
row_mask = df_y.notna().all(axis=1) & df_b.notna().all(axis=1)
df_y = df_y.loc[row_mask]
df_b = df_b.loc[row_mask]
print("Measurement cov – using", df_y.shape[0], "dates")
maturities = np.array([float(c) for c in df_y.columns])
H = dl_loadings(maturities, lam) # (n, 3)
Y = df_y.values # (T, n)
B = df_b.values # (T, 3)
residuals = []
for t in range(Y.shape[0]):
y_t = Y[t, :] # (n,)
beta_t = B[t, :] # (3,)
y_hat_t = H @ beta_t # (n,)
e_t = y_t - y_hat_t
residuals.append(e_t)
E = np.vstack(residuals) # (T, n)
R_full = np.cov(E, rowvar=False) # cov across maturities
print("Any NaNs in R_full?", np.isnan(R_full).any())
sigma2 = float(np.nanmean(np.diag(R_full)))
n = df_y.shape[1]
R = sigma2 * np.eye(n)
return R, H, df_y, df_b
R, H, df_y_clean, df_b_clean = estimate_measurement_cov(df_yields, factors_ols, lam)
print("Shapes: Y", df_y_clean.shape, "H", H.shape, "R", R.shape)
Measurement cov – using 433 dates Any NaNs in R_full? False Shapes: Y (433, 30) H (30, 3) R (30, 30)
5) Kalman filtering and smoothing (DNS)¶
The OLS factors treat each date independently. A state-space approach instead combines:
- cross-sectional information from the yield curve at date $t$,
- time-series information from the factor dynamics.
The Kalman filter is the core algorithm used to perform inference in linear Gaussian state-space models. In this project, it is used to estimate and infer the latent yield-curve factors (level, slope, curvature) from observed yields.
We compute:
- Predicted state: $ \beta_{t|t-1} $ (before seeing yields at $t$)
- Filtered state: $ \beta_{t|t} $ (after incorporating yields at $t$)
- Smoothed state: $ \beta_{t|T} $ (using the full sample $1..T$)
The Kalman framework distinguishes these three different estimates of the state.
Smoothed factors are especially useful for downstream economic applications because they reduce estimation noise while staying model-consistent.
def kalman_filter_smoother(Y, H, A, Q, R, beta0=None, P0=None):
Y = np.asarray(Y)
H = np.asarray(H)
A = np.asarray(A)
Q = np.asarray(Q)
R = np.asarray(R)
T, n_mats = Y.shape
n_states = A.shape[0]
assert H.shape == (n_mats, n_states), f"H shape {H.shape} != ({n_mats},{n_states})"
assert A.shape == (n_states, n_states)
assert Q.shape == (n_states, n_states)
assert R.shape == (n_mats, n_mats)
beta_pred = np.zeros((T, n_states))
P_pred = np.zeros((T, n_states, n_states))
beta_filt = np.zeros((T, n_states))
P_filt = np.zeros((T, n_states, n_states))
I = np.eye(n_states)
if beta0 is None:
beta0 = np.zeros(n_states)
if P0 is None:
P0 = 10.0 * np.eye(n_states)
beta_prev = beta0
P_prev = P0
for t in range(T):
# Prediction
beta_t_pred = A @ beta_prev
P_t_pred = A @ P_prev @ A.T + Q
# Update
y_t = Y[t, :] # (n_mats,)
S_t = H @ P_t_pred @ H.T + R # (n_mats, n_mats)
K_t = P_t_pred @ H.T @ np.linalg.inv(S_t) # (n_states, n_mats)
y_hat_t = H @ beta_t_pred # (n_mats,)
innov = y_t - y_hat_t # (n_mats,)
beta_t_filt = beta_t_pred + K_t @ innov
P_t_filt = (I - K_t @ H) @ P_t_pred
beta_pred[t] = beta_t_pred
P_pred[t] = P_t_pred
beta_filt[t] = beta_t_filt
P_filt[t] = P_t_filt
beta_prev = beta_t_filt
P_prev = P_t_filt
# RTS smoother
beta_smooth = np.zeros_like(beta_filt)
P_smooth = np.zeros_like(P_filt)
beta_smooth[-1] = beta_filt[-1]
P_smooth[-1] = P_filt[-1]
for t in range(T - 2, -1, -1):
P_f = P_filt[t]
P_p_next = P_pred[t + 1]
C_t = P_f @ A.T @ np.linalg.inv(P_p_next) # (n_states, n_states)
beta_smooth[t] = beta_filt[t] + C_t @ (beta_smooth[t + 1] - beta_pred[t + 1])
P_smooth[t] = P_f + C_t @ (P_smooth[t + 1] - P_p_next) @ C_t.T
return beta_filt, beta_smooth
# 1) OLS factors
factors_ols = estimate_diebold_li_factors(df_yields, lam)
# 2) A, Q from VAR on OLS factors
var_model = VAR(factors_ols)
var_res = var_model.fit(maxlags=1)
A = var_res.coefs[0]
Q = var_res.sigma_u
# 3) R, H, and cleaned yields/factors
R, H, df_y_clean, df_b_clean = estimate_measurement_cov(df_yields, factors_ols, lam)
print("Shapes before Kalman:")
print("Y:", df_y_clean.shape)
print("H:", H.shape)
print("A:", A.shape)
print("Q:", Q.shape)
print("R:", R.shape)
# 4) Run Kalman + smoother
Y = df_y_clean.values # (T, n)
beta0 = df_b_clean.iloc[0].values # first OLS beta as init
P0 = np.eye(3)
beta_filt, beta_smooth = kalman_filter_smoother(Y, H, A, Q, R, beta0, P0)
idx = df_b_clean.index
cols = ["level", "slope", "curvature"]
factors_filt = pd.DataFrame(beta_filt, index=idx, columns=cols)
factors_smooth = pd.DataFrame(beta_smooth, index=idx, columns=cols)
Measurement cov – using 433 dates Any NaNs in R_full? False Shapes before Kalman: Y: (433, 30) H: (30, 3) A: (3, 3) Q: (3, 3) R: (30, 30)
C:\Users\thoma\.conda\envs\pymc_env\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency ME will be used. self._init_dates(dates, freq)
6) Applying the Kalman filter/smoother¶
We initialize the state and covariance and run the filter forward and the RTS smoother backward.
Two sanity checks matter here:
- Shapes: $Y$ is (T × N maturities), $H$ is (N × 3), states are (T × 3).
- Scale: yields should be in consistent units (e.g., decimals rather than percent) throughout.
The output is a time series of factor estimates with three versions (predicted, filtered, smoothed).
# Initial state: use first OLS estimate as beta0
beta0 = df_b_clean.iloc[0].values.astype(float) # shape (3,)
P0 = np.eye(3) * 1.0 # initial covariance; you can tweak the scale
# Y is the observation matrix (T, n_mats)
Y = df_y_clean.values.astype(float)
beta_filt, beta_smooth = kalman_filter_smoother(
Y, H, A, Q, R,
beta0=beta0,
P0=P0
)
beta_filt.shape, beta_smooth.shape
((433, 3), (433, 3))
idx = df_b_clean.index
cols = ["level", "slope", "curvature"]
factors_filt = pd.DataFrame(beta_filt, index=idx, columns=cols)
factors_smooth = pd.DataFrame(beta_smooth, index=idx, columns=cols)
factors_filt.to_csv(DATA_DIR / "dl_factors_kalman_filtered_sample.csv")
factors_smooth.to_csv(DATA_DIR / "dl_factors_kalman_smoothed_sample.csv")
factors_smooth.head()
| level | slope | curvature | |
|---|---|---|---|
| 1990-01-31 | 0.078016 | 0.003215 | 0.014766 |
| 1990-02-28 | 0.075872 | 0.006037 | 0.020238 |
| 1990-03-31 | 0.079267 | 0.005030 | 0.012670 |
| 1990-04-30 | 0.077276 | 0.009562 | 0.023612 |
| 1990-05-31 | 0.072786 | 0.009663 | 0.024977 |
7) OLS vs Kalman factors (what should we expect?)¶
OLS and Kalman-smoothed factors can look very close in benign settings because:
- the Nelson–Siegel cross-section is already very informative,
- the VAR dynamics mostly provide gentle time-series regularization.
The real value of state-space estimation shows up when:
- measurement noise is material,
- missing observations occur,
- we extend the model (e.g., AFNS adjustment),
- we need probabilistic filtering objects (pred/filtered/smoothed) for decision-making.
This is a necessary stepping stone to AFNS and to dynamic hedging.
plt.figure(figsize=(10,4))
plt.plot(df_b_clean["level"], label="OLS", alpha=0.6)
plt.plot(factors_smooth["level"], label="Kalman smooth", alpha=0.8)
plt.title("Level factor – OLS vs Kalman")
plt.legend()
plt.grid(True)
plt.show()
plt.figure(figsize=(10,4))
plt.plot(df_b_clean["slope"], label="OLS", alpha=0.6)
plt.plot(factors_smooth["slope"], label="Kalman smooth", alpha=0.8)
plt.title("Slope factor – OLS vs Kalman")
plt.legend()
plt.grid(True)
plt.show()
plt.figure(figsize=(10,4))
plt.plot(df_b_clean["curvature"], label="OLS", alpha=0.6)
plt.plot(factors_smooth["curvature"], label="Kalman smooth", alpha=0.8)
plt.title("Curvature factor – OLS vs Kalman")
plt.legend()
plt.grid(True)
plt.show()
8) Fully latent state-space estimation via MLE¶
Instead of treating OLS + VAR as “two-step”, we can estimate a coherent state-space model by maximum likelihood:
- Transition parameters: $A, Q$
- Measurement parameters: $R$ (and potentially $\lambda$ in some variants)
This step is closer to how term structure models are often estimated in practice: choose parameters to maximize the likelihood implied by the Kalman filter.
We start with sensible initial values and then optimize the negative log-likelihood.
def unpack_params(theta):
"""
Map parameter vector theta (length 16) -> (A, Q, R_scalar).
A: 3x3
Q: 3x3 (PSD via Cholesky L L')
R: scalar variance (we'll build R = R * I outside)
"""
theta = np.asarray(theta)
assert theta.size == 16
# A entries
A_flat = theta[0:9]
A = A_flat.reshape(3, 3)
# Cholesky L parameters for Q
l11, l21, l22, l31, l32, l33 = theta[9:15]
L = np.array([
[np.exp(l11), 0.0, 0.0],
[l21, np.exp(l22), 0.0],
[l31, l32, np.exp(l33)]
])
Q = L @ L.T
# Measurement variance
log_sigma = theta[15]
sigma = np.exp(log_sigma)
R_scalar = sigma**2
return A, Q, R_scalar
def kalman_loglik(theta, Y, H, beta0=None, P0=None):
"""
Negative log-likelihood for given theta, using Kalman filter.
theta: parameter vector (length 16)
Y: (T, n) array of yields
H: (n, 3) loadings matrix
beta0: initial state mean (3,)
P0: initial state covariance (3,3)
Returns: negative log-likelihood (float)
"""
Y = np.asarray(Y)
T, n = Y.shape
A, Q, R_scalar = unpack_params(theta)
R = R_scalar * np.eye(n)
k = 3
if beta0 is None:
beta0 = np.zeros(k)
if P0 is None:
P0 = 10.0 * np.eye(k)
beta_prev = beta0
P_prev = P0
I_k = np.eye(k)
loglik = 0.0
const = n * np.log(2 * np.pi)
for t in range(T):
# Prediction
beta_pred = A @ beta_prev
P_pred = A @ P_prev @ A.T + Q
y_t = Y[t, :] # (n,)
# Innovation
S_t = H @ P_pred @ H.T + R # (n,n)
try:
S_inv = np.linalg.inv(S_t)
sign, logdet = np.linalg.slogdet(S_t)
if sign <= 0:
# Penalize non-PD S_t
return 1e6
except np.linalg.LinAlgError:
return 1e6
y_hat = H @ beta_pred
innov = y_t - y_hat # (n,)
# Contribution to log-likelihood
quad = innov.T @ S_inv @ innov
loglik_t = -0.5 * (const + logdet + quad)
loglik += loglik_t
# Kalman update (for next step)
K_t = P_pred @ H.T @ S_inv # (3,n)
beta_filt = beta_pred + K_t @ innov
P_filt = (I_k - K_t @ H) @ P_pred
beta_prev, P_prev = beta_filt, P_filt
# We return negative log-likelihood for minimization
return -float(loglik)
def cholesky_param_from_Q(Q):
"""
Take a 3x3 PSD Q and get Cholesky parameter vector
(l11, l21, l22, l31, l32, l33)
such that L L' = Q and L has exp(diag) structure.
"""
L0 = np.linalg.cholesky(Q)
# enforce positive diag via exp parameterization
l11 = np.log(L0[0,0])
l22 = np.log(L0[1,1])
l33 = np.log(L0[2,2])
# off-diagonals stay as is
l21 = L0[1,0]
l31 = L0[2,0]
l32 = L0[2,1]
return np.array([l11, l21, l22, l31, l32, l33])
def initial_theta_from_2step(A_init, Q_init, R_sigma2_init):
A_flat = A_init.flatten()
l_params = cholesky_param_from_Q(Q_init)
log_sigma0 = 0.5 * np.log(R_sigma2_init)
theta0 = np.concatenate([A_flat, l_params, np.array([log_sigma0])])
return theta0
9) Initialization choices for MLE¶
State-space likelihood optimization is sensitive to starting points.
We use:
- a stable initial $A$ (eigenvalues inside the unit circle),
- small but non-zero $Q$ to allow realistic factor innovations,
- diagonal $R$ as a parsimonious first approximation to measurement noise.
The goal is not “perfect initialization”, but a starting point that avoids numerical pathologies and lets the optimizer find a plausible region of parameter space.
# 1. Initial A, Q, R
A_init = np.array([
[0.98, 0.01, 0.00],
[0.00, 0.90, 0.05],
[0.00, -0.05, 0.80]
])
Q_init = np.array([
[0.0005, 0.0, 0.0],
[0.0, 0.0010, 0.0],
[0.0, 0.0, 0.0010]
])
R_sigma2_init = 1e-4
# 2. Turn Q into Cholesky parameters
l_params = cholesky_param_from_Q(Q_init)
# 3. Flatten A and make log-sigma
A_flat = A_init.flatten()
log_sigma0 = 0.5 * np.log(R_sigma2_init)
# 4. Full initial vector (length 16)
theta0 = np.concatenate([
A_flat,
l_params,
np.array([log_sigma0])
])
print(theta0)
[ 0.98 0.01 0. 0. 0.9 0.05 0. -0.05 0.8 -3.80045123 0. -3.45387764 0. 0. -3.45387764 -4.60517019]
10) Maximum likelihood estimation¶
We optimize the negative log-likelihood produced by the Kalman filter.
Practical notes:
- we typically enforce stability / positivity constraints implicitly (e.g., parameterizing variances in log-space),
- we monitor convergence and sanity-check parameter magnitudes,
- the end product is a set of parameters that make the observed yield panel most likely under the model.
from scipy.optimize import minimize
# Y, H from cleaned dataset and loadings
Y = df_y_clean.values.astype(float)
maturities = np.array([float(c) for c in df_y_clean.columns])
H = dl_loadings(maturities, lam) # (n,3)
beta0 = df_b_clean.iloc[0].values # or zeros
P0 = np.eye(3) * 1.0
def objective(theta):
return kalman_loglik(theta, Y, H, beta0=beta0, P0=P0)
res = minimize(
objective,
theta0,
method="L-BFGS-B",
options={"maxiter": 200, "disp": True}
)
print("Converged:", res.success)
print("Final neg loglik:", res.fun)
Converged: True Final neg loglik: -65595.06096658931
11) Estimated parameters and interpretation¶
After optimization, we extract:
- $A$: persistence and cross-factor transmission
- $Q$: variance of factor shocks (state noise)
- $R$: measurement noise by maturity (observation noise)
A useful sanity check is that:
- $A$ implies persistent but stable factors,
- $Q$ is not degenerate (not all zeros),
- $R$ does not explode for specific maturities (unless data quality demands it).
theta_hat = res.x
A_hat, Q_hat, R_scalar_hat = unpack_params(theta_hat)
R_hat = R_scalar_hat * np.eye(Y.shape[1])
12) Smoothed factors under MLE¶
With the MLE parameters fixed, we re-run the Kalman filter and smoother to produce the final factor estimates. We also estimate $\lambda$ through MLE.
These smoothed factors are the state variables we will carry forward into:
- AFNS (no-arbitrage yield adjustment),
- balance sheet / NII simulation,
- control and RL environments.
From this point, the modeling focus shifts from “fit the curve” to “use the curve as a state in a decision problem”.
beta_filt_mle, beta_smooth_mle = kalman_filter_smoother(
Y, H, A_hat, Q_hat, R_hat,
beta0=beta0,
P0=P0
)
factors_smooth_mle = pd.DataFrame(
beta_smooth_mle,
index=df_y_clean.index,
columns=["level","slope","curvature"]
)
def mle_given_lambda(lam, Y, maturities, theta0):
H = dl_loadings(maturities, lam)
def objective(theta):
return kalman_loglik(theta, Y, H, beta0=beta0, P0=P0)
res = minimize(objective, theta0, method="L-BFGS-B",
options={"maxiter": 200})
return res.fun, res.x # neg loglik, theta_hat
Y = df_y_clean.values
maturities = np.array([float(c) for c in df_y_clean.columns])
lambda_grid = np.linspace(0.01, 1.2, 30) # adjust as you like
best_val = np.inf
best_lam = None
best_theta = None
for lam_try in lambda_grid:
neg_ll, theta_hat = mle_given_lambda(lam_try, Y, maturities, theta0)
print(lam_try, neg_ll)
if neg_ll < best_val:
best_val = neg_ll
best_lam = lam_try
best_theta = theta_hat
print("Best lambda:", best_lam, "neg loglik:", best_val)
# Summary of all ML estimates we will carry forward
A_hat, Q_hat, R_scalar_hat = unpack_params(best_theta)
R_hat = R_scalar_hat * np.eye(Y.shape[1])
H_hat = dl_loadings(maturities, best_lam)
0.01 -58523.40386389547 0.05103448275862069 -68235.44496930015 0.09206896551724138 -69411.19470853177 0.13310344827586207 -69003.10620696761 0.17413793103448277 -68297.39647498455 0.21517241379310348 -68300.57319024569 0.25620689655172413 -68504.16957735507 0.29724137931034483 -67714.65179067253 0.33827586206896554 -67569.73645112943 0.37931034482758624 -67542.85889860686 0.42034482758620695 -67685.70734203488 0.4613793103448276 -67438.99810142393 0.5024137931034482 -66801.69443086201 0.543448275862069 -66676.98152542647 0.5844827586206897 -67163.57699394242 0.6255172413793104 -67431.45961477139 0.6665517241379311 -67198.9383200503 0.7075862068965517 -65940.00632280785 0.7486206896551725 -66564.35026607805 0.7896551724137931 -66795.19893583513 0.8306896551724139 -66349.21308869353 0.8717241379310345 -66461.79086466972 0.9127586206896552 -64896.40307781075 0.953793103448276 -64560.713959196124 0.9948275862068966 -64161.10854262734 1.0358620689655174 -65162.95527895225 1.076896551724138 -63435.62354720741 1.1179310344827587 -63281.93443362484 1.1589655172413793 -64318.54262469717 1.2 -63397.063454094154 Best lambda: 0.09206896551724138 neg loglik: -69411.19470853177
Part II — AFNS (Arbitrage-Free Nelson–Siegel)¶
From Diebold–Li to AFNS: No-Arbitrage Term Structure Modeling¶
This project models the yield curve using the Arbitrage-Free Nelson–Siegel (AFNS) framework originally introduced by Christensen, Diebold, and Rudebusch (2009). The AFNS model builds directly on the Diebold–Li dynamic Nelson–Siegel (DNS) model, enhancing it with no-arbitrage restrictions.
The Diebold–Li (Dynamic Nelson–Siegel) Model¶
The Diebold–Li model represents the zero-coupon yield curve at time $t$ as a linear function of three latent factors:
$ y_t(\tau)¶
L_t + S_t \frac{1 - e^{-\lambda \tau}}{\lambda \tau} + C_t \left( \frac{1 - e^{-\lambda \tau}}{\lambda \tau}¶
e^{-\lambda \tau} \right) $
where:
- $L_t$ is the level factor,
- $S_t$ is the slope factor,
- $C_t$ is the curvature factor,
- $\lambda$ controls factor loadings across maturities.
The factors evolve dynamically, typically as a VAR(1):
$ \mathbf{X}_{t+1}¶
\boldsymbol{\mu} + \Phi (\mathbf{X}t - \boldsymbol{\mu}) + \boldsymbol{\varepsilon}{t+1} $
The Diebold–Li model is:
- parsimonious,
- empirically successful,
- and highly interpretable.
However, it is purely statistical.
The Key Limitation: Lack of No-Arbitrage¶
The Diebold–Li model does not impose no-arbitrage restrictions.
This has important consequences:
- The model fits yields well, but
- It does not guarantee that yields are consistent with the existence of an underlying stochastic discount factor,
- It cannot be used coherently for pricing interest-rate-sensitive instruments.
In particular, nothing in the Diebold–Li model ensures that yields at different maturities are linked through arbitrage-free pricing relations.
This is acceptable for forecasting, but may pose problems for applications involving hedging, valuation, and balance-sheet risk. With the imposition of no arbitrage, we insure consistency between forward rates and offset exposures correctly when hedging.
Risk-Neutral Pricing and No-Arbitrage¶
In arbitrage-free term-structure models, bond prices are expectations under a risk-neutral probability measure $\mathbb{Q}$:
$ P_t(\tau)¶
\mathbb{E}^\mathbb{Q}_t \left[ \exp\left(
- \int_t^{t+\tau} r_s , ds \right) \right] $
where:
- $r_t$ is the instantaneous short rate,
- risk premia are absorbed into the change of measure from the physical $\mathbb{P}$ to the risk-neutral $\mathbb{Q}$ measure.
In affine term-structure models, this leads to yields of the form:
$ y_t(\tau)¶
A(\tau) + B(\tau)^\top \mathbf{X}_t $
with $A(\tau)$ and $B(\tau)$ determined by:
- the dynamics of $\mathbf{X}_t$ under $\mathbb{Q}$,
- and the specification of the short rate.
This structure enforces internal consistency across maturities.
The AFNS Model: Making Nelson–Siegel Arbitrage-Free¶
The AFNS model preserves the Nelson–Siegel factor structure while embedding it into an affine no-arbitrage framework.
Short Rate Specification¶
The short rate is defined as a linear function of the Nelson–Siegel factors:
$ r_t = L_t + S_t $
This choice preserves the economic interpretation of the level and slope factors.
Risk-Neutral Dynamics¶
Under the risk-neutral measure $\mathbb{Q}$, the factors follow affine Gaussian dynamics:
$ d\mathbf{X}_t¶
K_\mathbb{Q} (\theta_\mathbb{Q} - \mathbf{X}_t) , dt + \Sigma , d\mathbf{W}^\mathbb{Q}_t $
These continuous-time dynamics imply closed-form expressions for bond prices and yields.
The Yield Adjustment Term¶
The key difference between DNS and AFNS lies in the yield adjustment term.
Observed yields satisfy: $ y_t(\tau)¶
A(\tau) + B(\tau)^\top \mathbf{X}_t + \eta_t $
where:
- $B(\tau)$ has the same Nelson–Siegel loadings as in Diebold–Li,
- $A(\tau)$ is a maturity-dependent adjustment term.
This adjustment term:
- depends on the factor volatilities,
- captures Jensen’s inequality effects from stochastic discounting,
- and ensures that yields satisfy no-arbitrage restrictions.
Importantly:
AFNS does not change the factor loadings. It changes the intercept.
This preserves interpretability while enforcing arbitrage-free pricing.
Relationship Between DNS and AFNS¶
The AFNS model can be viewed as:
Diebold–Li + a model-consistent yield adjustment term
Key implications:
- DNS is recovered as a special case when volatilities vanish,
- AFNS remains empirically flexible,
- AFNS supports pricing, hedging, and risk-neutral valuation.
Thus, AFNS is a structural refinement, not a competing model.
Why AFNS is important in this project¶
This project studies:
- interest-rate risk in the banking book,
- dynamic hedging with interest-rate derivatives,
- and optimal decision-making under uncertainty.
These tasks require:
- consistent pricing across maturities,
- coherent forward-rate dynamics,
- and economically meaningful hedge payoffs.
AFNS provides:
- a no-arbitrage state-space representation of the yield curve,
- compatibility with Kalman filtering and smoothing,
- and a principled foundation for both LQ control and reinforcement learning.
Conceptual Summary¶
- Diebold–Li offers a flexible statistical representation of the yield curve.
- AFNS embeds this representation into an affine no-arbitrage framework.
- The adjustment term $A(\tau)$ enforces pricing consistency without sacrificing interpretability.
- This makes AFNS the natural choice for applications that bridge econometrics, pricing, and dynamic hedging.
Reference: Christensen, Diebold, and Rudebusch (2009), “The Affine Arbitrage-Free Class of Nelson–Siegel Term Structure Models.”
AFNS approach and implementation used here¶
We implement AFNS as an extension on top of our DNS implementation:
- Keep factor dynamics under the physical measure $\mathbb{P}$ (estimated from data).
- Modify the measurement equation by adding the AFNS adjustment term.
This is a pragmatic “best of both worlds” approach:
- retains the DNS interpretability and estimation pipeline,
- introduces no-arbitrage consistency in yield construction.
Also be careful not to confuse $A$, the matrix of DNS VAR coefficients, with $A(\tau)$, the AFNS no-arbitrage adjustment term
from scipy.linalg import logm
def compute_K_from_A(A, delta_t=1/12):
"""
Given discrete-time A (3x3) and time step delta_t in years (monthly = 1/12),
approximate continuous-time K via matrix logarithm.
"""
A = np.asarray(A)
K = - (1.0 / delta_t) * logm(A) # <-- minus sign here
K = np.real_if_close(K)
return K
def afns_AB_grid(K, Q, delta0, delta1, theta, tau_grid, n_steps=200):
"""
Compute A(tau), B(tau) on a grid of maturities tau_grid (in years)
for an AFNS-like model with:
dX_t = K (theta - X_t) dt + noise
r_t = delta0 + delta1' X_t
Uses simple Euler integration of the ODEs:
dB/dtau = -K' B - delta1
dA/dtau = -delta0 - (K theta)' B + 0.5 B' Q B
K: (3,3)
Q: (3,3) continuous-time state covariance
delta0: scalar
delta1: (3,) vector
theta: (3,) long-run mean
tau_grid: array of maturities in years
n_steps: steps per year for numerical integration
"""
K = np.asarray(K)
Q = np.asarray(Q)
delta1 = np.asarray(delta1).reshape(3,)
theta = np.asarray(theta).reshape(3,)
tau_grid = np.asarray(tau_grid)
taus_sorted = np.sort(tau_grid)
max_tau = taus_sorted[-1]
dtau = 1.0 / n_steps # step in years
n_iter = int(max_tau / dtau) + 1
B = np.zeros((3,)) # B(0)
A = 0.0 # A(0)
A_vals = {}
B_vals = {}
current_tau = 0.0
idx_tau = 0
K_T = K.T
Ktheta = K @ theta
for i in range(n_iter):
# store values when we cross a tau in tau_grid
while idx_tau < len(taus_sorted) and current_tau >= taus_sorted[idx_tau] - 1e-8:
tau_val = taus_sorted[idx_tau]
A_vals[tau_val] = A
B_vals[tau_val] = B.copy()
idx_tau += 1
if idx_tau >= len(taus_sorted):
break
if idx_tau >= len(taus_sorted):
break
# ODEs:
dB = -(K_T @ B) - delta1
dA = -delta0 - (Ktheta @ B) + 0.5 * (B @ Q @ B)
B = B + dB * dtau
A = A + dA * dtau
current_tau += dtau
# Convert dicts to arrays aligned with original tau_grid order
A_array = np.array([A_vals[tau] for tau in tau_grid])
B_array = np.vstack([B_vals[tau] for tau in tau_grid]) # (n_tau, 3)
return A_array, B_array
class AFNSFromDL:
def __init__(self, A_P, Q_P, factors_df, maturities, delta0=None, delta1=None, delta_t=1/12):
"""
A_P, Q_P: discrete-time DL MLE dynamics (3x3 each)
factors_df: DataFrame with columns ['level','slope','curvature']
maturities: array-like of maturities in years (e.g. [1.0, 2.0, ..., 30.0])
delta0: scalar for short-rate intercept (if None, set to 0)
delta1: length-3 array (if None, default [1,1,0])
delta_t: time step in years for A_P (monthly = 1/12)
"""
self.A_P = np.asarray(A_P)
self.Q_P = np.asarray(Q_P)
self.factors = factors_df
self.maturities = np.asarray(maturities)
self.delta_t = delta_t
self.K = compute_K_from_A(self.A_P, delta_t=delta_t)
# crude continuous-time Q: scale discrete Q by 1/delta_t
self.Q_ct = self.Q_P * delta_t
if delta1 is None:
self.delta1 = np.array([1.0, 1.0, 0.0])
else:
self.delta1 = np.asarray(delta1).reshape(3,)
if delta0 is None:
self.delta0 = 0.0
else:
self.delta0 = float(delta0)
# long-run mean theta: sample mean of factors
self.theta = self.factors[["level","slope","curvature"]].mean().values
# precompute A(tau), B(tau) and build linear mapping
self.A_tau, self.B_tau = afns_AB_grid(
self.K, self.Q_ct, self.delta0, self.delta1, self.theta,
tau_grid=self.maturities
)
# Mapping: y_t(tau_i) = a_i + M_i dot X_t
# with a_i = -A(tau_i)/tau_i, M_i = -B(tau_i)/tau_i
self.a_vec = -self.A_tau / self.maturities
self.M_mat = -self.B_tau / self.maturities[:, None] # (n_tau, 3)
def yields_from_factors(self, X_t):
"""
Given X_t = [level, slope, curvature], return AFNS zero-coupon yields
at all self.maturities.
"""
X_t = np.asarray(X_t).reshape(3,)
return self.a_vec + self.M_mat @ X_t
def yields_from_path(self):
"""
Apply AFNS mapping to the whole factor path.
Returns DataFrame: index like factors_df, columns = maturities as strings.
"""
X = self.factors[["level","slope","curvature"]].values # (T,3)
Y = X @ self.M_mat.T + self.a_vec # (T, n_tau)
cols = [f"{m:.1f}" for m in self.maturities]
return pd.DataFrame(Y, index=self.factors.index, columns=cols)
# Suppose you have:
# A_hat, Q_hat from DL MLE
# factors_smooth_mle: DataFrame with level/slope/curvature
# maturities: e.g. np.array([1.0, 2.0, 3.0, 5.0, 7.0, 10.0, 20.0, 30.0])
afns_model = AFNSFromDL(
A_P=A_hat,
Q_P=Q_hat,
factors_df=factors_smooth_mle,
maturities=np.array([1.0, 2.0, 3.0, 5.0, 7.0, 10.0, 20.0, 30.0])
)
afns_yields = afns_model.yields_from_path()
afns_yields.head()
| 1.0 | 2.0 | 3.0 | 5.0 | 7.0 | 10.0 | 20.0 | 30.0 | |
|---|---|---|---|---|---|---|---|---|
| date | ||||||||
| 1990-01-31 | 0.098342 | 0.099302 | 0.089368 | 0.046221 | -0.023844 | -0.177594 | -1.126377 | -2.799207 |
| 1990-02-28 | 0.099142 | 0.100048 | 0.090092 | 0.046931 | -0.023138 | -0.176886 | -1.125657 | -2.798470 |
| 1990-03-31 | 0.101035 | 0.101805 | 0.091783 | 0.048563 | -0.021530 | -0.175293 | -1.124059 | -2.796845 |
| 1990-04-30 | 0.105012 | 0.106307 | 0.096513 | 0.053477 | -0.016537 | -0.170230 | -1.118841 | -2.791492 |
| 1990-05-31 | 0.100435 | 0.101578 | 0.091719 | 0.048629 | -0.021414 | -0.175142 | -1.123866 | -2.796636 |
AFNS approach used here¶
We implement AFNS as an extension on top of DNS:
- Keep factor dynamics under the physical measure $\mathbb{P}$ (estimated from data).
- Modify the measurement equation by adding the AFNS adjustment term.
This is a pragmatic “best of both worlds” approach:
- retains the DNS interpretability and estimation pipeline,
- introduces no-arbitrage consistency in yield construction.
def ns_loadings(tau, lam):
"""
Nelson–Siegel factor loadings:
B1(tau) = 1
B2(tau) = (1 - exp(-lam*tau)) / (lam*tau)
B3(tau) = B2(tau) - exp(-lam*tau)
"""
tau = np.asarray(tau, dtype=float)
eps = 1e-8
x = lam * np.maximum(tau, eps)
exp_term = np.exp(-x)
B1 = np.ones_like(tau)
B2 = (1.0 - exp_term) / x
B3 = B2 - exp_term
return B1, B2, B3
def afns_yield_adjustment(tau, lam, sig1, sig2, sig3):
"""
Independent-factor AFNS yield-adjustment term:
C(t,T)/(T-t) as function of tau, lam, sigma1..3.
"""
tau = np.asarray(tau, dtype=float)
lam = float(lam)
s1, s2, s3 = float(sig1), float(sig2), float(sig3)
eps = 1e-8
tau_safe = np.maximum(tau, eps)
x = lam * tau_safe
e1 = np.exp(-x)
e2 = np.exp(-2 * x)
# Level factor contribution
I1 = (s1**2 / 6.0) * tau_safe**2
# Slope factor contribution
I2 = s2**2 * (
1.0 / (2.0 * lam**2)
- (1.0 / lam**3) * (1.0 - e1) / tau_safe
+ (1.0 / (4.0 * lam**3)) * (1.0 - e2) / tau_safe
)
# Curvature factor contribution
I3 = s3**2 * (
1.0 / (2.0 * lam**2)
+ (1.0 / lam**2) * e1
- (1.0 / (4.0 * lam)) * tau_safe * e2
- (3.0 / (4.0 * lam**2)) * e2
- (2.0 / lam**3) * (1.0 - e1) / tau_safe
+ (5.0 / (8.0 * lam**3)) * (1.0 - e2) / tau_safe
)
return I1 + I2 + I3
def build_measurement_matrices(taus, lam, sig1, sig2, sig3):
"""
Build:
- H: N x 3 factor loading matrix
- a: N-dimensional intercept vector (AFNS adj term)
used in y_t = a + H X_t + eps_t
"""
taus = np.asarray(taus, dtype=float)
B1, B2, B3 = ns_loadings(taus, lam)
H = np.column_stack([B1, B2, B3])
C_adj = afns_yield_adjustment(taus, lam, sig1, sig2, sig3)
a = -C_adj
return H, a
Utility functions: DNS yields and parameter vector¶
We keep helper functions for:
- generating DNS yields from factors (baseline reference),
- unpacking the parameter vector $\theta$ used in optimization.
Packing parameters into a vector is standard for numerical optimization; unpacking makes the model readable and reduces bugs when mapping parameters to matrices $(\Phi, Q, R$, etc.).
def dns_yields_from_factors(X_t, taus, lam):
"""
Produce DNS/Nelson–Siegel yields from factor vector X_t=[L,S,C]
"""
X_t = np.asarray(X_t, dtype=float)
L_t, S_t, C_t = X_t
B1, B2, B3 = ns_loadings(taus, lam)
return L_t * B1 + S_t * B2 + C_t * B3
def afns_yields_from_factors(X_t, taus, lam, sig1, sig2, sig3):
"""
AFNS yields = DNS yields - no-arbitrage adjustment
"""
dns = dns_yields_from_factors(X_t, taus, lam)
adj = afns_yield_adjustment(taus, lam, sig1, sig2, sig3)
return dns - adj
def unpack_theta(theta):
"""
Unpack the 14-parameter AFNS reduced-form vector.
"""
theta = np.asarray(theta, dtype=float)
phi_L, phi_S, phi_C = theta[0:3]
mu_L, mu_S, mu_C = theta[3:6]
log_qL, log_qS, log_qC = theta[6:9]
log_lam = theta[9]
log_sig1, log_sig2, log_sig3 = theta[10:13]
log_r = theta[13]
Phi = np.diag([phi_L, phi_S, phi_C])
mu = np.array([mu_L, mu_S, mu_C])
Q = np.diag([np.exp(log_qL)**2,
np.exp(log_qS)**2,
np.exp(log_qC)**2])
lam = np.exp(log_lam)
sig1, sig2, sig3 = np.exp(log_sig1), np.exp(log_sig2), np.exp(log_sig3)
r = np.exp(log_r)
return Phi, mu, Q, lam, sig1, sig2, sig3, r
CurveModelAFNS: reusable curve model object¶
CurveModelAFNS is the “model object” used downstream.
Responsibilities:
- store estimated parameters,
- simulate factor paths under $\mathbb{P}$-dynamics,
- transform factors into yields under DNS or AFNS.
This is useful because later sections (NII, LQ, RL) can treat the term structure model as a black box that produces:
- state variables (factors),
- and market observables (yields/forwards).
class CurveModelAFNS:
"""
DNS/AFNS term structure model with independent AR(1) P-dynamics
and AFNS no-arbitrage adjustment in measurement eq.
"""
def __init__(self, theta_hat, taus):
"""
theta_hat: estimated 14-parameter vector
taus: maturities in years (array-like)
"""
self.theta = np.asarray(theta_hat, dtype=float)
self.taus = np.asarray(taus, dtype=float)
(
self.Phi,
self.mu,
self.Q,
self.lam,
self.sig1,
self.sig2,
self.sig3,
self.r,
) = unpack_theta(self.theta)
# Build AFNS measurement structures
self.H, self.a = build_measurement_matrices(
self.taus, self.lam, self.sig1, self.sig2, self.sig3
)
self.R = (self.r**2) * np.eye(len(self.taus))
# ---------- simulators ----------
def simulate_factors(self, T, x0=None, rng=None):
"""
Simulate factor path X_t under P-dynamics (AR(1)) for t=0..T-1.
Returns array (T, 3).
"""
if rng is None:
rng = np.random.default_rng()
if x0 is None:
x0 = self.mu.copy()
X = np.zeros((T, 3))
X[0] = x0
for t in range(1, T):
eps = rng.multivariate_normal(mean=np.zeros(3), cov=self.Q)
X[t] = self.mu + self.Phi @ (X[t-1] - self.mu) + eps
return X
def simulate_yields(self, X, model="afns"):
"""
Given factor path X (T,3), return yields (T, N) under DNS or AFNS.
"""
X = np.asarray(X, dtype=float)
T = X.shape[0]
N = len(self.taus)
Y = np.zeros((T, N))
for t in range(T):
if model == "dns":
Y[t] = dns_yields_from_factors(X[t], self.taus, self.lam)
elif model == "afns":
Y[t] = afns_yields_from_factors(
X[t], self.taus, self.lam, self.sig1, self.sig2, self.sig3
)
else:
raise ValueError("model must be 'dns' or 'afns'")
return Y
AFNS estimation step¶
We estimate AFNS parameters by maximizing the likelihood of the yield panel under the AFNS state-space model.
Compared to DNS:
- the observation equation includes the AFNS adjustment,
- additional parameters govern the adjustment term (volatility-related).
The output is a single parameter vector $\theta$ that defines both:
- the factor dynamics,
- and the measurement mapping implied by no-arbitrage.
def kalman_loglik_afns(theta, Y, taus, beta0=None, P0=None):
"""
AFNS Kalman log-likelihood.
Model:
X_t - mu = Phi (X_{t-1} - mu) + eta_t, eta_t ~ N(0, Q)
y_t = a + H X_t + eps_t, eps_t ~ N(0, R)
where (Phi, mu, Q, lam, sig1..3, r) = unpack_theta(theta)
and H, a are built via AFNS (no-arbitrage adjustment).
Parameters
----------
theta : array-like, shape (14,)
Parameter vector as defined in unpack_theta.
Y : array-like, shape (T, N)
Observed yields (T time points, N maturities).
taus : array-like, shape (N,)
Maturities in years corresponding to columns of Y.
beta0 : array-like, shape (3,), optional
Initial state mean; if None, we use mu.
P0 : array-like, shape (3, 3), optional
Initial state covariance; if None, we use 0.1 * I.
Returns
-------
neg_loglik : float
Negative log-likelihood (for minimization).
"""
Y = np.asarray(Y, dtype=float)
T, N = Y.shape
taus = np.asarray(taus, dtype=float)
# Unpack parameters
Phi, mu, Q, lam, sig1, sig2, sig3, r = unpack_theta(theta)
# Measurement matrices (AFNS)
H, a = build_measurement_matrices(taus, lam, sig1, sig2, sig3)
R = (r**2) * np.eye(N)
# Adjust observations to absorb intercept: y'_t = y_t - a
Y_adj = Y - a[None, :]
k = 3 # number of factors
if beta0 is None:
beta0 = mu.copy()
if P0 is None:
P0 = 0.1 * np.eye(k)
beta_prev = beta0
P_prev = P0
I_k = np.eye(k)
loglik = 0.0
const = N * np.log(2 * np.pi)
for t in range(T):
# Prediction step: X_t|t-1
beta_pred = mu + Phi @ (beta_prev - mu)
P_pred = Phi @ P_prev @ Phi.T + Q
# Observation for this time
y_t = Y_adj[t, :] # (N,)
# Innovation covariance
S_t = H @ P_pred @ H.T + R # (N, N)
try:
S_inv = np.linalg.inv(S_t)
sign, logdet = np.linalg.slogdet(S_t)
if sign <= 0:
# non-PD covariance → penalize
return 1e6
except np.linalg.LinAlgError:
return 1e6
# Innovation
y_hat = H @ beta_pred # (N,)
innov = y_t - y_hat # (N,)
quad = innov.T @ S_inv @ innov
loglik_t = -0.5 * (const + logdet + quad)
loglik += loglik_t
# Update step
K_t = P_pred @ H.T @ S_inv # (3, N)
beta_filt = beta_pred + K_t @ innov
P_filt = (I_k - K_t @ H) @ P_pred
beta_prev, P_prev = beta_filt, P_filt
# Return negative log-likelihood for minimization
return -float(loglik)
def make_initial_theta(Y, taus):
"""
Construct a rough initial guess for the 14-parameter AFNS vector.
Y: (T, N) yields
taus: (N,) maturities
Returns
-------
theta0 : np.ndarray, shape (14,)
"""
Y = np.asarray(Y, dtype=float)
T, N = Y.shape
# crude guesses
# factor means: use average of shortest maturity for level, 0 for slope/curvature
mu_L0 = float(Y[:, 0].mean())
mu_S0 = 0.0
mu_C0 = 0.0
# AR coefficients: persistent level, less for slope/curvature
phi_L0, phi_S0, phi_C0 = 0.98, 0.90, 0.80
# state noise std devs (log scale)
log_qL0 = np.log(0.01)
log_qS0 = np.log(0.02)
log_qC0 = np.log(0.02)
# lambda around typical NS values (e.g. DL ~ 0.06–0.1 for monthly)
log_lam0 = np.log(0.06)
# AFNS vol parameters (continuous-time vols for level, slope, curvature)
log_sig1_0 = np.log(0.01)
log_sig2_0 = np.log(0.02)
log_sig3_0 = np.log(0.02)
# measurement noise std dev
log_r0 = np.log(0.001)
theta0 = np.array([
phi_L0, phi_S0, phi_C0,
mu_L0, mu_S0, mu_C0,
log_qL0, log_qS0, log_qC0,
log_lam0,
log_sig1_0, log_sig2_0, log_sig3_0,
log_r0
], dtype=float)
return theta0
def fit_afns_mle(Y, taus, theta0=None, maxiter=300):
"""
Estimate AFNS parameters by MLE via Kalman filter.
Parameters
----------
Y : (T, N) array
Yield panel (time x maturities).
taus : (N,) array
Maturities in years (aligned with columns of Y).
theta0 : array-like, optional
Initial guess for parameters; if None, we use make_initial_theta().
maxiter : int
Maximum number of optimizer iterations.
Returns
-------
theta_hat : np.ndarray
Estimated parameter vector.
res : OptimizeResult
Full scipy.optimize result object.
"""
Y = np.asarray(Y, dtype=float)
taus = np.asarray(taus, dtype=float)
if theta0 is None:
theta0 = make_initial_theta(Y, taus)
def objective(theta):
return kalman_loglik_afns(theta, Y, taus)
res = minimize(
objective,
theta0,
method="L-BFGS-B",
options={"maxiter": maxiter, "disp": True}
)
theta_hat = res.x
return theta_hat, res
Maturities and yield matrix used for AFNS¶
Here we select a fixed set of maturities (e.g., 1y, 2y, 3y, ..., 30y) and build:
- the yield matrix $Y$ as (T × N),
- the maturity vector $\tau$ in years.
This consistent maturity grid is important for:
- stable estimation,
- clean stress tests,
- and a well-defined mapping from factors to yields in the downstream hedging environment.
# Choose maturities (columns must exist in df_yields)
cols = ["1.0", "2.0", "3.0", "5.0", "7.0", "10.0", "20.0", "30.0"]
taus = np.array([float(c) for c in cols])
Y = df_yields[cols].values # shape (T, N)
theta0 = make_initial_theta(Y, taus)
theta_hat, res = fit_afns_mle(Y, taus, theta0=theta0, maxiter=300)
print("Converged:", res.success)
print("Final negative log-likelihood:", res.fun)
print("Estimated parameters:", theta_hat)
Converged: True Final negative log-likelihood: -17635.449541385406 Estimated parameters: [ 0.99014058 0.98892233 0.95661017 0.06655483 -0.04882837 -0.02951158 -6.37122426 -5.89062987 -5.01021734 -1.54559202 -5.29410301 -5.8361747 -3.28498768 -6.92484316]
Sanity-checking the fitted AFNS model¶
We instantiate the AFNS curve model and verify basic behavior:
- yields are in a plausible range,
- simulated yields move smoothly with factors,
- the mapping is stable across maturities.
This “model health check” matters because any downstream hedging result is only meaningful if the term structure layer is sensible.
cm = CurveModelAFNS(theta_hat, taus)
# simulate 10 years monthly
T = 10 * 12
X_sim = cm.simulate_factors(T)
Y_dns = cm.simulate_yields(X_sim, model="dns")
Y_afns = cm.simulate_yields(X_sim, model="afns")
plt.plot(Y_afns)
[<matplotlib.lines.Line2D at 0x2d7816760d0>, <matplotlib.lines.Line2D at 0x2d781676210>, <matplotlib.lines.Line2D at 0x2d781676350>, <matplotlib.lines.Line2D at 0x2d781676490>, <matplotlib.lines.Line2D at 0x2d7816765d0>, <matplotlib.lines.Line2D at 0x2d781676710>, <matplotlib.lines.Line2D at 0x2d781676850>, <matplotlib.lines.Line2D at 0x2d781676990>]
state_space.py
State space: filtering and smoothing for AFNS¶
For decision problems, we often want a clean state estimate at every time step.
The following functions provide:
- Kalman filtering (online state estimation),
- RTS smoothing (offline best estimate using the full sample),
- log-likelihood computation for estimation.
The output (predicted/filtered/smoothed states) is also helpful for explaining uncertainty and model diagnostics.
def kalman_filter_afns(theta, Y, taus, beta0=None, P0=None):
"""
Run the Kalman filter for the AFNS Option-B model.
Model:
X_t - mu = Phi (X_{t-1} - mu) + eta_t, eta_t ~ N(0, Q)
y_t = a + H X_t + eps_t, eps_t ~ N(0, R)
We absorb the intercept a into the observations:
y'_t = y_t - a = H X_t + eps_t.
Parameters
----------
theta : array-like, shape (14,)
Parameter vector as in unpack_theta.
Y : (T, N) array
Yield panel.
taus : (N,) array
Maturities in years.
beta0 : (3,) array, optional
Initial state mean; default = mu.
P0 : (3,3) array, optional
Initial state covariance; default = 0.1 * I.
Returns
-------
filt_means : (T, 3)
Filtered state means E[X_t | Y_1..t].
filt_covs : (T, 3, 3)
Filtered covariance matrices.
pred_means : (T, 3)
One-step-ahead predicted means E[X_t | Y_1..t-1].
pred_covs : (T, 3, 3)
One-step-ahead predicted covariances.
loglik : float
Total log-likelihood (same as in MLE, for reference).
extra : dict
Dict with (Phi, mu, Q, H, a, R) for reuse in smoother, plotting, etc.
"""
Y = np.asarray(Y, dtype=float)
taus = np.asarray(taus, dtype=float)
T, N = Y.shape
Phi, mu, Q, lam, sig1, sig2, sig3, r = unpack_theta(theta)
H, a = build_measurement_matrices(taus, lam, sig1, sig2, sig3)
R = (r**2) * np.eye(N)
# absorb intercept
Y_adj = Y - a[None, :]
k = 3
if beta0 is None:
beta0 = mu.copy()
if P0 is None:
P0 = 0.1 * np.eye(k)
filt_means = np.zeros((T, k))
filt_covs = np.zeros((T, k, k))
pred_means = np.zeros((T, k))
pred_covs = np.zeros((T, k, k))
beta_prev = beta0
P_prev = P0
I_k = np.eye(k)
loglik = 0.0
const = N * np.log(2 * np.pi)
for t in range(T):
# prediction
beta_pred = mu + Phi @ (beta_prev - mu)
P_pred = Phi @ P_prev @ Phi.T + Q
pred_means[t] = beta_pred
pred_covs[t] = P_pred
y_t = Y_adj[t, :] # (N,)
S_t = H @ P_pred @ H.T + R # (N,N)
# innovation covariance must be PD
try:
S_inv = np.linalg.inv(S_t)
sign, logdet = np.linalg.slogdet(S_t)
if sign <= 0:
raise np.linalg.LinAlgError("Non-PD innovation covariance")
except np.linalg.LinAlgError:
# error handling
return None, None, None, None, -np.inf, {}
y_hat = H @ beta_pred # (N,)
innov = y_t - y_hat # (N,)
quad = innov.T @ S_inv @ innov
loglik_t = -0.5 * (const + logdet + quad)
loglik += loglik_t
# update
K_t = P_pred @ H.T @ S_inv # (3,N)
beta_filt = beta_pred + K_t @ innov
P_filt = (I_k - K_t @ H) @ P_pred
filt_means[t] = beta_filt
filt_covs[t] = P_filt
beta_prev, P_prev = beta_filt, P_filt
extra = {
"Phi": Phi,
"mu": mu,
"Q": Q,
"H": H,
"a": a,
"R": R,
}
return filt_means, filt_covs, pred_means, pred_covs, float(loglik), extra
Predicted vs filtered vs smoothed states¶
- Predicted $X_{t|t-1}$: what the model expects before seeing data at time $t$.
- Filtered $X_{t|t}$: updated estimate after observing yields at $t$.
- Smoothed $X_{t|T}$: best estimate using all observations $1,\ldots,T$.
For hedging experiments in this notebook we mainly use smoothed factors as a clean, denoised “state history” to drive baseline paths and stress tests.
def rts_smoother_afns(filt_means, filt_covs, pred_means, pred_covs, Phi):
"""
Rauch–Tung–Striebel smoother for AFNS model.
Parameters
----------
filt_means : (T, 3)
Filtered means from Kalman filter.
filt_covs : (T, 3, 3)
Filtered covariances.
pred_means : (T, 3)
One-step-ahead predicted means.
pred_covs : (T, 3, 3)
One-step-ahead predicted covariances.
Phi : (3,3)
State transition matrix (constant over time in this model).
Returns
-------
smooth_means : (T, 3)
Smoothed state means E[X_t | Y_1..T].
smooth_covs : (T, 3, 3)
Smoothed state covariances.
"""
filt_means = np.asarray(filt_means, dtype=float)
filt_covs = np.asarray(filt_covs, dtype=float)
pred_means = np.asarray(pred_means, dtype=float)
pred_covs = np.asarray(pred_covs, dtype=float)
T, k = filt_means.shape
smooth_means = np.zeros_like(filt_means)
smooth_covs = np.zeros_like(filt_covs)
# initialize at T-1
smooth_means[-1] = filt_means[-1]
smooth_covs[-1] = filt_covs[-1]
Phi_T = Phi.T
for t in range(T - 2, -1, -1):
P_filt_t = filt_covs[t]
P_pred_next = pred_covs[t + 1]
# smoother gain
J_t = P_filt_t @ Phi_T @ np.linalg.inv(P_pred_next)
# smoothed mean
smooth_means[t] = (
filt_means[t]
+ J_t @ (smooth_means[t + 1] - pred_means[t + 1])
)
# smoothed covariance
smooth_covs[t] = (
P_filt_t
+ J_t @ (smooth_covs[t + 1] - P_pred_next) @ J_t.T
)
return smooth_means, smooth_covs
AFNS filtering/smoothing output¶
We run the AFNS filter/smoother and compare the resulting factor estimates to the simpler DNS versions.
At this point, we have a complete term-structure layer:
- no-arbitrage consistent measurement equation,
- estimated factor dynamics,
- and a usable state vector $X_t = (L_t, S_t, C_t)$.
Next we shift from modeling to decision-making: define a simplified banking book and formulate hedging as a control/RL problem.
# theta_hat from MLE, Y and taus from GSW data
filt_means, filt_covs, pred_means, pred_covs, loglik, extra = kalman_filter_afns(
theta_hat, Y, taus
)
Phi = extra["Phi"]
smooth_means, smooth_covs = rts_smoother_afns(
filt_means, filt_covs, pred_means, pred_covs, Phi
)
# smooth_means is (T,3): AFNS factors L_t, S_t, C_t
L = smooth_means[:, 0]
S = smooth_means[:, 1]
C = smooth_means[:, 2]
plt.plot(filt_means)
plt.legend(["Level", "Slope", "Curvature"])
<matplotlib.legend.Legend at 0x2d787cfc190>
plt.plot(smooth_means)
plt.legend(["Level", "Slope", "Curvature"])
<matplotlib.legend.Legend at 0x2d787d6c550>
Part III — Banking book and NII hedging experiment design¶
From State-Space Modeling to Optimal Control: Kalman Filtering and LQ Control¶
We now build on the state-space representation of the yield curve provided by the AFNS model to study estimation, prediction, and optimal hedging decisions. Once interest rate dynamics are expressed in state-space form, two powerful and closely related tools become available:
- Kalman filtering and smoothing, for inference on latent states;
- Optimal control theory, for designing dynamic hedging policies.
This section explains how these tools arise naturally from the AFNS state-space structure and how they are used in this work.
State-Space Structure as the Unifying Framework¶
Recall that the AFNS model provides a linear Gaussian state-space representation of the yield curve:
State (transition) equation¶
$ \mathbf{X}_{t+1}¶
\boldsymbol{\mu} + \Phi (\mathbf{X}t - \boldsymbol{\mu}) + \boldsymbol{\varepsilon}{t+1}, \qquad \boldsymbol{\varepsilon}_{t+1} \sim \mathcal{N}(0, Q) $
Measurement equation¶
$ \mathbf{y}_t¶
H \mathbf{X}_t + \mathbf{a} + \boldsymbol{\eta}_t, \qquad \boldsymbol{\eta}_t \sim \mathcal{N}(0, R) $
Here:
- $\mathbf{X}_t = (L_t, S_t, C_t)$ are latent yield-curve factors,
- $\mathbf{y}_t$ are observed yields across maturities.
This representation separates dynamics (how rates evolve) from measurement (how rates are observed), which is the key prerequisite for both filtering and control.
Kalman Filtering: Inference on Latent Yield Factors¶
The Kalman filter is an estimator for linear Gaussian state-space models. In this project, it is used to infer the unobserved AFNS factors from observed yield data.
Prediction¶
Before observing yields at time $t$, the model produces a forecast: $ \hat{\mathbf{X}}_{t|t-1} = \boldsymbol{\mu} + \Phi (\hat{\mathbf{X}}_{t-1|t-1} - \boldsymbol{\mu}) $
This is a model-based prediction driven solely by the transition equation.
Filtering¶
After observing yields $\mathbf{y}_t$, the prediction is updated: $ \hat{\mathbf{X}}_{t|t}¶
\hat{\mathbf{X}}{t|t-1} + K_t \big( \mathbf{y}_t - H \hat{\mathbf{X}}{t|t-1} - \mathbf{a} \big) $
The Kalman gain $K_t$ balances:
- confidence in the model (via $Q$),
- confidence in the data (via $R$).
This filtered estimate represents real-time knowledge of the yield curve.
Smoothing¶
For structural analysis, the Rauch–Tung–Striebel (RTS) smoother is applied after filtering. It combines past, present, and future information to produce: $ \hat{\mathbf{X}}_{t|T} $
In this project, smoothed AFNS factors provide:
- low-noise state estimates,
- a stable reference path for calibration,
- and a clean baseline for control experiments.
From Estimation to Decision-Making: Augmenting the State¶
To study hedging, the state vector is augmented to include the hedge inventory: $ \mathbf{s}_t = \begin{pmatrix} L_t \\ S_t \\ C_t \\ h_t \end{pmatrix} $
The augmented dynamics are linear: $ \mathbf{s}_{t+1} = A \mathbf{s}_t + B u_t + \boldsymbol{\xi}_{t+1} $
where:
- $u_t = \Delta h_t$ is the hedge adjustment (control),
- yield-curve factors evolve exogenously,
- hedge inventory evolves deterministically given control.
This linear state-space system forms the basis of optimal control theory.
Optimal Control Theory in This Context¶
Optimal control asks:
Given stochastic state dynamics, how should control actions be chosen to optimize a long-run objective?
In this project, the objective is to stabilize Net Interest Income (NII) while controlling hedge usage.
NII is a linear function of the state: $ \text{NII}_{t+1} = C^\top \mathbf{s}_t + D h_t + \text{noise} $
This linear–Gaussian structure makes classical control tools applicable.
Linear–Quadratic (LQ) Control with L2 Costs¶
Quadratic Objective¶
The LQ framework assumes a quadratic objective: $ \min_{u_t} \mathbb{E} \sum_{t=0}^{\infty} \left( \text{NII}_{t+1}^2 + \lambda_h h_t^2 + \lambda_u u_t^2 \right) $
Interpretation:
- penalize NII volatility (interest rate risk),
- penalize large hedge inventories (balance-sheet usage),
- penalize frequent hedge adjustments (trading intensity).
Optimal Policy¶
Under linear dynamics and quadratic costs:
- the value function is quadratic,
- the optimal policy is linear in the state: $ u_t = -K \mathbf{s}_t $
The feedback matrix $K$ is obtained by solving the Riccati equation.
This solution is:
- analytical,
- stable,
- fully interpretable.
Why LQ Control Is a Natural Benchmark¶
LQ control represents the best possible policy under the assumptions of:
- linear dynamics,
- Gaussian shocks,
- symmetric (quadratic) costs.
In this project:
- the AFNS model satisfies these assumptions almost exactly,
- making LQ control an ideal theoretical benchmark.
Importantly, LQ control is not an approximation here. It is the optimal solution to a well-defined problem.
Role in This Project¶
The LQ solution serves three purposes:
Economic benchmark
It defines what optimal hedging looks like in a frictionless quadratic world.Diagnostic tool
Deviations from LQ performance reveal where assumptions break down.Reference point for RL
Reinforcement learning is introduced only when costs become non-quadratic (e.g. L1 transaction costs), a setting where LQ theory no longer applies.
Conceptual Summary¶
- The AFNS model provides a linear Gaussian state-space description of interest rate dynamics.
- The Kalman filter extracts latent yield-curve factors optimally.
- Augmenting the state with hedge inventory transforms the model into a controlled system.
- Linear–quadratic control delivers the optimal hedging policy under quadratic costs.
- This classical solution establishes the benchmark against which more flexible methods are evaluated.
We now connect the term-structure state $X_t$ to a simplified IRRBB objective.
Main ingredients:
- a stylized balance sheet with representative asset and liability repricing maturities,
- a hedging instrument (FRA-style payoff in this notebook),
- an objective function that trades off NII risk vs hedge usage and trading costs.
This is intentionally simplified: the goal is a clean, interpretable sandbox where we can compare classical control and RL under controlled assumptions.
LQ control benchmark (quadratic costs)¶
We first set up a classical Linear–Quadratic (LQ) benchmark:
- State includes curve factors and hedge inventory.
- Control is the hedge adjustment $u_t = \Delta h_t$.
- Objective penalizes:
- NII variability (risk term),
- hedge inventory (balance sheet usage),
- trading intensity (turnover).
This is the regime where classical control is expected to perform very well because the dynamics are linear-Gaussian and the objective is quadratic.
# ============================================================
# 0) USER INPUTS
# ============================================================
# Required:
# - X_smooth: array (T, 3) of Kalman-smoothed AFNS factors [L,S,C]
X_smooth = smooth_means
# - cm: calibrated CurveModelAFNS (needs lam, sig1..sig3 and AFNS yield function)
# Banking book + frequency
dt = 1.0 / 12.0 # monthly in years
tau_A = 3.0 # assets repricing maturity (years)
tau_L = 1.0 # liabilities repricing maturity (years)
A_notional = 100.0
L_notional = 100.0
# LQ weights (tune later)
alpha_nii = 1.0 # strength of "penalize NII" term
lambda_u = 1e-2 # trading penalty (smaller => more aggressive hedging)
lambda_h = 1e-7 # hedge inventory penalty (optional)
lambda_u = 1e-3
lambda_h = 1e-6
# Stress scenario
shock_bps = 200 # +200 bps
shock = shock_bps / 10000.0
# ============================================================
# 1) AFNS yield + forward-rate utilities
# ============================================================
def afns_yield_single_from_cm(cm, X, tau):
"""
Compute AFNS yield y(tau;X) using cm parameters.
Depends on
existing afns_yields_from_factors implementation.
"""
taus = np.array([tau], dtype=float)
y = afns_yields_from_factors(X, taus, cm.lam, cm.sig1, cm.sig2, cm.sig3)
return float(y[0])
def forward_rate_cc_from_cm(cm, X, tau1, tau2):
"""
Continuous-compounded forward rate f(tau1,tau2):
f = (tau2*y(tau2) - tau1*y(tau1)) / (tau2 - tau1)
"""
y1 = afns_yield_single_from_cm(cm, X, tau1)
y2 = afns_yield_single_from_cm(cm, X, tau2)
return (tau2 * y2 - tau1 * y1) / (tau2 - tau1)
# ============================================================
# 2) NII definition with FRA hedge
# ============================================================
def compute_unhedged_nii_path(cm, X_path, A_notional, L_notional, tau_A, tau_L, dt):
"""
Unhedged NII_{t+1} = A*y_t(tauA)*dt - L*y_t(tauL)*dt
Returns array length T-1.
"""
T = X_path.shape[0]
NII0 = np.zeros(T-1)
for t in range(T-1):
yA = afns_yield_single_from_cm(cm, X_path[t], tau_A)
yL = afns_yield_single_from_cm(cm, X_path[t], tau_L)
NII0[t] = A_notional * yA * dt - L_notional * yL * dt
return NII0
def compute_hedged_nii_path_FRA(cm, X_path, h_path, A_notional, L_notional, tau_A, tau_L, dt):
"""
Hedged NII_{t+1} = A*y_t(tauA)*dt - L*y_t(tauL)*dt + h_t*(K_t - y_{t+1}(tauL))*dt
with K_t = forward(tauL, tauL+dt).
"""
T = X_path.shape[0]
NIIh = np.zeros(T-1)
for t in range(T-1):
yA = afns_yield_single_from_cm(cm, X_path[t], tau_A)
yL = afns_yield_single_from_cm(cm, X_path[t], tau_L)
K_t = forward_rate_cc_from_cm(cm, X_path[t], tau_L, tau_L + dt)
y_float_next = afns_yield_single_from_cm(cm, X_path[t+1], tau_L)
NIIh[t] = (A_notional * yA * dt
- L_notional * yL * dt
+ h_path[t] * (K_t - y_float_next) * dt)
return NIIh
# ============================================================
# 3) Estimate factor dynamics from smoothed factors (AR(1))
# X_{t+1} = c + Phi X_t + eps
# We'll convert to mean-reverting form with mu if desired.
# ============================================================
def fit_var1(X):
"""
Fit VAR(1): X_{t+1} = c + Phi X_t + eps, via OLS.
Returns c (3,), Phi (3,3), Sigma (3,3).
"""
X = np.asarray(X, dtype=float)
Y = X[1:] # (T-1,3)
Z = X[:-1] # (T-1,3)
# add intercept
Z1 = np.column_stack([np.ones(Z.shape[0]), Z]) # (T-1, 1+3)
# OLS for each equation
B = np.linalg.lstsq(Z1, Y, rcond=None)[0] # (1+3, 3)
c = B[0] # (3,)
Phi = B[1:].T # (3,3) because (3x3)
resid = Y - Z1 @ B # (T-1,3)
Sigma = (resid.T @ resid) / (resid.shape[0] - (1 + 3)) # sample cov
return c, Phi, Sigma
# ============================================================
# 4) Build LQ problem from scratch
# State x_t = [L,S,C,h]
# Dynamics: X_{t+1} = c + Phi X_t + eps, h_{t+1} = h_t + u_t
#
# Objective: penalize (approx NII)^2 + lambda_h h^2 + lambda_u u^2
#
# Key step: build H_x (sensitivity of NII to state) numerically
# ============================================================
def numerical_grad_y(cm, X_ref, tau, eps=1e-5):
"""
Numerical gradient of y(tau;X) wrt X=(L,S,C) using central differences.
Returns grad (3,).
"""
grad = np.zeros(3)
for i in range(3):
d = np.zeros(3)
d[i] = eps
yp = afns_yield_single_from_cm(cm, X_ref + d, tau)
ym = afns_yield_single_from_cm(cm, X_ref - d, tau)
grad[i] = (yp - ym) / (2 * eps)
return grad
def build_Hx_QR_from_nii(cm, X_ref,
A_notional, L_notional, tau_A, tau_L, dt,
alpha_nii, lambda_h, lambda_u):
"""
Build H_x, Q_s, R for LQ:
approx NII(x_t) ≈ H_x' [X_t; h_t]
=> (NII)^2 ≈ x' (alpha * H_x H_x') x
"""
# Factor sensitivity of the base NII part (using numerical gradients)
grad_yA = numerical_grad_y(cm, X_ref, tau_A)
grad_yL = numerical_grad_y(cm, X_ref, tau_L)
Hx_X = dt * (A_notional * grad_yA - L_notional * grad_yL) # (3,)
# Hedge sensitivity via FRA: d/dh of hedge payoff term at ref
K_ref = forward_rate_cc_from_cm(cm, X_ref, tau_L, tau_L + dt)
yL_ref = afns_yield_single_from_cm(cm, X_ref, tau_L)
d_h = (K_ref - yL_ref) * dt # scalar
H_x = np.zeros(4)
H_x[:3] = Hx_X
H_x[3] = d_h
# Quadratic cost matrices
Q_s = alpha_nii * np.outer(H_x, H_x)
if lambda_h > 0:
e4 = np.array([0.0, 0.0, 0.0, 1.0])
Q_s = Q_s + lambda_h * np.outer(e4, e4)
R = np.array([[lambda_u]], dtype=float)
return H_x, Q_s, R
def build_AB_from_Phi(Phi):
A = np.zeros((4, 4))
A[:3, :3] = Phi
A[3, 3] = 1.0
B = np.zeros((4, 1))
B[3, 0] = 1.0
return A, B
def solve_discrete_riccati(A, B, Q, R, max_iter=20000, tol=1e-12):
"""
Iterative solution to discrete algebraic Riccati equation (DARE).
Returns P, K where u_t = -K x_t.
"""
A = np.asarray(A, float)
B = np.asarray(B, float)
Q = np.asarray(Q, float)
R = np.asarray(R, float)
P = Q.copy()
for _ in range(max_iter):
S = R + B.T @ P @ B
K = np.linalg.solve(S, B.T @ P @ A)
P_next = Q + A.T @ P @ A - A.T @ P @ B @ K
if np.max(np.abs(P_next - P)) < tol:
P = P_next
break
P = P_next
S = R + B.T @ P @ B
K = np.linalg.solve(S, B.T @ P @ A)
return P, K
# ============================================================
# 5) Run LQ hedging on an exogenous factor path
# ============================================================
def run_lq_hedge_on_path(cm, X_path, Phi, H_x, Q_s, R,
A_notional, L_notional, tau_A, tau_L, dt):
"""
Given an exogenous factor path X_path and VAR(1) Phi (for control design),
compute LQ hedge policy and resulting hedge path h_t and NII.
"""
A, B = build_AB_from_Phi(Phi)
P, K = solve_discrete_riccati(A, B, Q_s, R)
# simulate hedge
T = X_path.shape[0]
h = np.zeros(T)
u = np.zeros(T-1)
for t in range(T-1):
x_t = np.array([X_path[t,0], X_path[t,1], X_path[t,2], h[t]])
u_t = -float(K @ x_t)
u[t] = u_t
h[t+1] = h[t] + u_t
NII_unhedged = compute_unhedged_nii_path(cm, X_path, A_notional, L_notional, tau_A, tau_L, dt)
NII_hedged = compute_hedged_nii_path_FRA(cm, X_path, h, A_notional, L_notional, tau_A, tau_L, dt)
return {
"K": K,
"h": h,
"u": u,
"NII_unhedged": NII_unhedged,
"NII_hedged": NII_hedged,
}
# ============================================================
# 6) Stress path builder (shock at t=0, then propagate with Phi, no noise)
# ============================================================
def make_stress_path_from_var1(X0, c, Phi, T, shock_vec=None):
"""
Deterministic stressed path:
X_0 = X0 + shock_vec
X_{t+1} = c + Phi X_t
"""
X = np.zeros((T, 3))
if shock_vec is None:
shock_vec = np.zeros(3)
X[0] = X0 + shock_vec
for t in range(T-1):
X[t+1] = c + Phi @ X[t]
return X
# ============================================================
# 7) MAIN RUN: baseline + stress
# ============================================================
# --- Fit factor dynamics from smoothed factors ---
c_hat, Phi_hat, Sigma_hat = fit_var1(X_smooth)
X_ref = X_smooth.mean(axis=0)
print("Fitted VAR(1) Phi:\n", Phi_hat)
print("Reference X_ref (mean of smoothed):", X_ref)
# --- Build LQ cost from NII sensitivities ---
H_x, Q_s, R = build_Hx_QR_from_nii(
cm, X_ref,
A_notional, L_notional, tau_A, tau_L, dt,
alpha_nii, lambda_h, lambda_u
)
print("H_x =", H_x)
print("Q_s max abs =", np.max(np.abs(Q_s)))
print("R =", R)
# --- Baseline: use the observed smoothed factor path as exogenous ---
res_base = run_lq_hedge_on_path(
cm, X_smooth, Phi_hat, H_x, Q_s, R,
A_notional, L_notional, tau_A, tau_L, dt
)
print("LQ gain K =", res_base["K"])
print("max |h| baseline:", np.max(np.abs(res_base["h"])))
print("std NII unhedged baseline:", np.std(res_base["NII_unhedged"]))
print("std NII hedged baseline:", np.std(res_base["NII_hedged"]))
plt.figure(figsize=(9,4))
plt.plot(res_base["NII_unhedged"], label="Unhedged (baseline)")
plt.plot(res_base["NII_hedged"], label="LQ hedged (baseline)")
plt.title("Baseline path (smoothed factors): Unhedged vs LQ-hedged NII")
plt.xlabel("t (months)")
plt.ylabel("NII")
plt.legend()
plt.tight_layout()
plt.show()
plt.figure(figsize=(9,3))
plt.plot(res_base["h"], label="Hedge notional h_t (baseline)")
plt.title("LQ hedge position (baseline)")
plt.xlabel("t (months)")
plt.ylabel("h_t")
plt.legend()
plt.tight_layout()
plt.show()
# --- Stress: +200bp parallel shock implemented as a level-factor bump at t=0 ---
# You can also shock slope/curvature; start with level.
shock_vec = np.array([shock, 0.0, 0.0])
X_stress = make_stress_path_from_var1(
X0=X_ref, c=c_hat, Phi=Phi_hat, T=X_smooth.shape[0], shock_vec=shock_vec
)
res_stress = run_lq_hedge_on_path(
cm, X_stress, Phi_hat, H_x, Q_s, R,
A_notional, L_notional, tau_A, tau_L, dt
)
print("\n--- STRESS RESULTS (+200bp level shock at t=0, deterministic VAR1 propagation) ---")
print("LQ gain K =", res_stress["K"])
print("max |h| stress:", np.max(np.abs(res_stress["h"])))
print("std NII unhedged stress:", np.std(res_stress["NII_unhedged"]))
print("std NII hedged stress:", np.std(res_stress["NII_hedged"]))
plt.figure(figsize=(9,4))
plt.plot(res_stress["NII_unhedged"], label="Unhedged (stress)")
plt.plot(res_stress["NII_hedged"], label="LQ hedged (stress)")
plt.title("Stress (+200bp level shock at t=0): Unhedged vs LQ-hedged NII")
plt.xlabel("t (months)")
plt.ylabel("NII")
plt.legend()
plt.tight_layout()
plt.show()
plt.figure(figsize=(9,3))
plt.plot(res_stress["h"], label="Hedge notional h_t (stress)")
plt.title("LQ hedge position under stress")
plt.xlabel("t (months)")
plt.ylabel("h_t")
plt.legend()
plt.tight_layout()
plt.show()
Fitted VAR(1) Phi: [[ 0.99370822 0.00405803 -0.00149313] [-0.01363542 0.99260183 0.00425663] [ 0.01979163 -0.02327738 0.95583683]] Reference X_ref (mean of smoothed): [ 0.07275833 -0.0443458 -0.02780776] H_x = [ 0.00000000e+00 -1.34869058e+00 9.88642779e-01 1.69524561e-04] Q_s max abs = 1.8189662905957733 R = [[0.001]]
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\2975159280.py:234: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) u_t = -float(K @ x_t)
LQ gain K = [[ 3.25468959 -6.40606207 1.65585809 0.03156371]] max |h| baseline: 19.29354216567544 std NII unhedged baseline: 0.041502314399371805 std NII hedged baseline: 0.037672283687605466
--- STRESS RESULTS (+200bp level shock at t=0, deterministic VAR1 propagation) --- LQ gain K = [[ 3.25468959 -6.40606207 1.65585809 0.03156371]] max |h| stress: 17.558518446200466 std NII unhedged stress: 0.011522314785212952 std NII hedged stress: 0.010116693836186045
def summarize_paths(N0, N1, h, u):
"""
Produce standard metrics for hedging quality and cost.
N0: unhedged NII (T-1,)
N1: hedged NII (T-1,)
h: hedge notional (T,)
u: hedge trades (T-1,)
"""
out = {}
out["std_unhedged"] = float(np.std(N0))
out["std_hedged"] = float(np.std(N1))
out["std_reduction_%"] = float(100.0 * (1.0 - out["std_hedged"] / out["std_unhedged"])) if out["std_unhedged"] > 0 else np.nan
out["min_unhedged"] = float(np.min(N0))
out["min_hedged"] = float(np.min(N1))
out["p05_unhedged"] = float(np.quantile(N0, 0.05))
out["p05_hedged"] = float(np.quantile(N1, 0.05))
out["mean_unhedged"] = float(np.mean(N0))
out["mean_hedged"] = float(np.mean(N1))
# “Cost” proxies
out["mean_abs_h"] = float(np.mean(np.abs(h)))
out["max_abs_h"] = float(np.max(np.abs(h)))
out["mean_abs_u"] = float(np.mean(np.abs(u)))
out["max_abs_u"] = float(np.max(np.abs(u)))
return out
def sweep_lq_hyperparams(cm, X_path, Phi_hat, X_ref,
A_notional, L_notional, tau_A, tau_L, dt,
alpha_nii,
lambda_u_grid,
lambda_h_grid):
"""
Sweeps (lambda_u, lambda_h) and returns a DataFrame of metrics.
Requires build_Hx_QR_from_nii(...) and run_lq_hedge_on_path(...).
"""
rows = []
for lambda_u in lambda_u_grid:
for lambda_h in lambda_h_grid:
H_x, Q_s, R = build_Hx_QR_from_nii(
cm, X_ref,
A_notional, L_notional, tau_A, tau_L, dt,
alpha_nii=alpha_nii,
lambda_h=lambda_h,
lambda_u=lambda_u
)
res = run_lq_hedge_on_path(
cm, X_path, Phi_hat, H_x, Q_s, R,
A_notional, L_notional, tau_A, tau_L, dt
)
metrics = summarize_paths(
res["NII_unhedged"], res["NII_hedged"], res["h"], res["u"]
)
row = {
"lambda_u": float(lambda_u),
"lambda_h": float(lambda_h),
"Hx_hedge_sensitivity": float(H_x[3]),
"K_max_abs": float(np.max(np.abs(res["K"]))),
**metrics
}
rows.append(row)
df = pd.DataFrame(rows)
# useful derived columns
df["turnover_proxy"] = df["mean_abs_u"] # rename, but keep explicit too
df["inventory_proxy"] = df["mean_abs_h"]
# Sort by best hedging first (std reduction), then lower turnover
df = df.sort_values(["std_reduction_%", "turnover_proxy"], ascending=[False, True]).reset_index(drop=True)
return df
lambda_u_grid = [1e-9, 1e-6, 1e-3]
lambda_h_grid = [1e-9, 1e-6, 1e-3]
df_sweep = sweep_lq_hyperparams(
cm=cm,
X_path=X_smooth, # or X_stress
Phi_hat=Phi_hat,
X_ref=X_ref,
A_notional=100.0,
L_notional=100.0,
tau_A=3.0,
tau_L=1.0,
dt=1/12,
alpha_nii=1.0,
lambda_u_grid=lambda_u_grid,
lambda_h_grid=lambda_h_grid
)
df_sweep
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\2975159280.py:234: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) u_t = -float(K @ x_t)
| lambda_u | lambda_h | Hx_hedge_sensitivity | K_max_abs | std_unhedged | std_hedged | std_reduction_% | min_unhedged | min_hedged | p05_unhedged | p05_hedged | mean_unhedged | mean_hedged | mean_abs_h | max_abs_h | mean_abs_u | max_abs_u | turnover_proxy | inventory_proxy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.000000e-03 | 1.000000e-06 | 0.00017 | 6.406062 | 0.041502 | 0.037672 | 9.228475 | -0.069115 | -0.064571 | -0.028967 | -0.028228 | 0.031514 | 0.028723 | 14.301091 | 19.293542 | 0.130948 | 0.445572 | 0.130948 | 14.301091 |
| 1 | 1.000000e-06 | 1.000000e-06 | 0.00017 | 140.444405 | 0.041502 | 0.038307 | 7.699966 | -0.069115 | -0.073067 | -0.028967 | -0.029062 | 0.031514 | 0.028599 | 7.382631 | 19.662846 | 0.599310 | 3.206865 | 0.599310 | 7.382631 |
| 2 | 1.000000e-09 | 1.000000e-06 | 0.00017 | 224.181261 | 0.041502 | 0.038391 | 7.497340 | -0.069115 | -0.073538 | -0.028967 | -0.029146 | 0.031514 | 0.028622 | 7.151036 | 19.601891 | 0.779073 | 4.209173 | 0.779073 | 7.151036 |
| 3 | 1.000000e-03 | 1.000000e-03 | 0.00017 | 0.618039 | 0.041502 | 0.041499 | 0.008182 | -0.069115 | -0.069119 | -0.028967 | -0.028967 | 0.031514 | 0.031511 | 0.007600 | 0.020229 | 0.000614 | 0.003281 | 0.000614 | 0.007600 |
| 4 | 1.000000e-06 | 1.000000e-03 | 0.00017 | 0.999002 | 0.041502 | 0.041499 | 0.007974 | -0.069115 | -0.069120 | -0.028967 | -0.028967 | 0.031514 | 0.031511 | 0.007356 | 0.020165 | 0.000801 | 0.004330 | 0.000801 | 0.007356 |
| 5 | 1.000000e-09 | 1.000000e-03 | 0.00017 | 0.999999 | 0.041502 | 0.041499 | 0.007973 | -0.069115 | -0.069120 | -0.028967 | -0.028967 | 0.031514 | 0.031511 | 0.007356 | 0.020165 | 0.000802 | 0.004332 | 0.000802 | 0.007356 |
| 6 | 1.000000e-03 | 1.000000e-09 | 0.00017 | 19.492054 | 0.041502 | 0.059200 | -42.643736 | -0.069115 | -0.233215 | -0.028967 | -0.097223 | 0.031514 | -0.004273 | 241.943762 | 322.147839 | 0.812684 | 2.626348 | 0.812684 | 241.943762 |
| 7 | 1.000000e-09 | 1.000000e-09 | 0.00017 | 7519.772929 | 0.041502 | 0.120635 | -190.670443 | -0.069115 | -0.541424 | -0.028967 | -0.350242 | 0.031514 | -0.068566 | 247.785757 | 678.185855 | 26.404902 | 143.727011 | 26.404902 | 247.785757 |
| 8 | 1.000000e-06 | 1.000000e-09 | 0.00017 | 1256.590292 | 0.041502 | 0.126251 | -204.201778 | -0.069115 | -0.538397 | -0.028967 | -0.348999 | 0.031514 | -0.071237 | 325.598944 | 700.395119 | 12.267414 | 42.837237 | 12.267414 | 325.598944 |
def plot_lq_tradeoff(df, title="LQ hyperparameter sweep: NII risk vs turnover"):
"""
Scatter plot:
x = turnover (mean_abs_u)
y = hedged NII volatility (std_hedged)
"""
x = df["mean_abs_u"].values
y = df["std_hedged"].values
plt.figure(figsize=(8,5))
plt.scatter(x, y)
# annotate points with (lambda_u, lambda_h)
for _, row in df.iterrows():
plt.annotate(
f"u={row['lambda_u']:.0e}, h={row['lambda_h']:.0e}",
(row["mean_abs_u"], row["std_hedged"]),
fontsize=8,
xytext=(4, 4),
textcoords="offset points"
)
plt.xlabel("Turnover proxy: mean(|u_t|)")
plt.ylabel("Risk proxy: std(NII_hedged)")
plt.title(title)
plt.tight_layout()
plt.show()
Choosing penalties: risk–turnover trade-off curve¶
Rather than picking $\lambda_u, \lambda_h$ arbitrarily, we sweep penalty values and measure:
- NII risk reduction
- versus trading activity (turnover) and inventory usage.
This produces a Pareto-style trade-off curve.
We then select a benchmark point that delivers meaningful risk reduction without unrealistic trading.
plot_lq_tradeoff(df_sweep)
def summarize_paths(N0, N1, h, u):
out = {}
out["std_unhedged"] = float(np.std(N0))
out["std_hedged"] = float(np.std(N1))
out["std_reduction_%"] = float(100.0 * (1.0 - out["std_hedged"]/out["std_unhedged"])) if out["std_unhedged"] > 0 else np.nan
out["p05_unhedged"] = float(np.quantile(N0, 0.05))
out["p05_hedged"] = float(np.quantile(N1, 0.05))
out["min_unhedged"] = float(np.min(N0))
out["min_hedged"] = float(np.min(N1))
out["mean_unhedged"] = float(np.mean(N0))
out["mean_hedged"] = float(np.mean(N1))
out["mean_abs_h"] = float(np.mean(np.abs(h)))
out["max_abs_h"] = float(np.max(np.abs(h)))
out["mean_abs_u"] = float(np.mean(np.abs(u)))
out["max_abs_u"] = float(np.max(np.abs(u)))
return out
def plot_lq_tradeoff(df, title="LQ sweep: NII risk vs turnover"):
"""
Scatter: x=turnover (mean|u|), y=risk (std hedged).
Annotate with lambdas.
"""
x = df["mean_abs_u"].values
y = df["std_hedged"].values
plt.figure(figsize=(8,5))
plt.scatter(x, y)
for _, row in df.iterrows():
plt.annotate(
f"u={row['lambda_u']:.0e}, h={row['lambda_h']:.0e}",
(row["mean_abs_u"], row["std_hedged"]),
fontsize=8, xytext=(4,4), textcoords="offset points"
)
plt.xlabel("Turnover proxy: mean(|u_t|)")
plt.ylabel("Risk proxy: std(NII_hedged)")
plt.title(title)
plt.tight_layout()
plt.show()
Simplified NII model and hedge instrument¶
We compute a stylized monthly NII:
- Assets repricing at a representative maturity $\tau_A$
- Liabilities repricing at $\tau_L$
- Hedge is modeled as an FRA-like payoff linked to forward vs realized short/roll rate
This structure is simple enough for transparency but still captures the core IRRBB mechanism: changes in the yield curve shift the rates that drive asset income and liability expense, and hedging offsets part of that sensitivity.
# Monthly setup
dt = 1.0/12.0
tau_A = 3.0
tau_L = 1.0
A_notional = 100.0
L_notional = 100.0
# Fit VAR(1) to smoothed factors
c_hat, Phi_hat, Sigma_hat = fit_var1(X_smooth)
X_ref = X_smooth.mean(axis=0)
print("Phi_hat:\n", Phi_hat)
print("X_ref:", X_ref)
# Stress scenario builders (deterministic paths for clean comparison)
def scenario_parallel_up(T, shock_bps=200):
shock = shock_bps/10000.0
shock_vec = np.array([shock, 0.0, 0.0]) # Level shock
return make_stress_path_from_var1(X0=X_ref, c=c_hat, Phi=Phi_hat, T=T, shock_vec=shock_vec)
def scenario_bear_steepener(T, level_bps=200, slope_bps=100):
# Simple stylized: +level and -slope (more steepness in NS sign convention)
level = level_bps/10000.0
slope = slope_bps/10000.0
shock_vec = np.array([level, -slope, 0.0])
return make_stress_path_from_var1(X0=X_ref, c=c_hat, Phi=Phi_hat, T=T, shock_vec=shock_vec)
def scenario_high_vol(T, vol_scale=3.0, seed=123):
# Stochastic path: amplify innovations
rng = np.random.default_rng(seed)
X = np.zeros((T,3))
X[0] = X_ref.copy()
for t in range(T-1):
eps = rng.multivariate_normal(np.zeros(3), vol_scale * Sigma_hat)
X[t+1] = c_hat + Phi_hat @ X[t] + eps
return X
T = X_smooth.shape[0]
X_parallel = scenario_parallel_up(T, shock_bps=200)
X_steepen = scenario_bear_steepener(T, level_bps=200, slope_bps=100)
X_highvol = scenario_high_vol(T, vol_scale=3.0, seed=123)
Phi_hat: [[ 0.99370822 0.00405803 -0.00149313] [-0.01363542 0.99260183 0.00425663] [ 0.01979163 -0.02327738 0.95583683]] X_ref: [ 0.07275833 -0.0443458 -0.02780776]
alpha_nii = 1.0
lambda_u_grid = [1e-4, 1e-3, 1e-2]
lambda_h_grid = [1e-7, 1e-6, 1e-5]
rows = []
for lambda_u in lambda_u_grid:
for lambda_h in lambda_h_grid:
H_x, Q_s, R = build_Hx_QR_from_nii(
cm, X_ref,
A_notional, L_notional, tau_A, tau_L, dt,
alpha_nii=alpha_nii,
lambda_h=lambda_h,
lambda_u=lambda_u
)
res = run_lq_hedge_on_path(
cm, X_smooth, Phi_hat, H_x, Q_s, R,
A_notional, L_notional, tau_A, tau_L, dt
)
metrics = summarize_paths(res["NII_unhedged"], res["NII_hedged"], res["h"], res["u"])
rows.append({
"lambda_u": float(lambda_u),
"lambda_h": float(lambda_h),
"Hx_hedge_sensitivity": float(H_x[3]),
"K_max_abs": float(np.max(np.abs(res["K"]))),
**metrics
})
df_lq = pd.DataFrame(rows)
# filter out degeneracy if needed
df_lq = df_lq[df_lq["Hx_hedge_sensitivity"].abs() > 1e-10].copy()
# sort: prefer bigger std reduction and lower turnover
df_lq = df_lq.sort_values(["std_reduction_%", "mean_abs_u"], ascending=[False, True]).reset_index(drop=True)
df_lq
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\2975159280.py:234: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) u_t = -float(K @ x_t)
| lambda_u | lambda_h | Hx_hedge_sensitivity | K_max_abs | std_unhedged | std_hedged | std_reduction_% | p05_unhedged | p05_hedged | min_unhedged | min_hedged | mean_unhedged | mean_hedged | mean_abs_h | max_abs_h | mean_abs_u | max_abs_u | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0001 | 1.000000e-07 | 0.00017 | 58.550083 | 0.041502 | 0.024047 | 42.057613 | -0.028967 | -0.030056 | -0.069115 | -0.067088 | 0.031514 | 0.008964 | 112.845336 | 155.551100 | 1.149311 | 3.772153 |
| 1 | 0.0010 | 1.000000e-07 | 0.00017 | 11.867674 | 0.041502 | 0.026164 | 36.957726 | -0.028967 | -0.024765 | -0.069115 | -0.074443 | 0.031514 | 0.015548 | 96.388903 | 123.418701 | 0.405471 | 1.539168 |
| 2 | 0.0100 | 1.000000e-07 | 0.00017 | 2.374251 | 0.041502 | 0.033512 | 19.252694 | -0.028967 | -0.023946 | -0.069115 | -0.058499 | 0.031514 | 0.026712 | 34.571771 | 48.226221 | 0.114668 | 0.317721 |
| 3 | 0.0010 | 1.000000e-06 | 0.00017 | 6.406062 | 0.041502 | 0.037672 | 9.228475 | -0.028967 | -0.028228 | -0.069115 | -0.064571 | 0.031514 | 0.028723 | 14.301091 | 19.293542 | 0.130948 | 0.445572 |
| 4 | 0.0001 | 1.000000e-06 | 0.00017 | 22.071626 | 0.041502 | 0.037781 | 8.965698 | -0.028967 | -0.028472 | -0.069115 | -0.068063 | 0.031514 | 0.028564 | 10.908506 | 20.326023 | 0.280277 | 0.874768 |
| 5 | 0.0100 | 1.000000e-06 | 0.00017 | 1.255649 | 0.041502 | 0.038588 | 7.022279 | -0.028967 | -0.028169 | -0.069115 | -0.063581 | 0.031514 | 0.029659 | 11.381624 | 14.652016 | 0.045579 | 0.170295 |
| 6 | 0.0100 | 1.000000e-05 | 0.00017 | 0.647022 | 0.041502 | 0.041100 | 0.969242 | -0.028967 | -0.028891 | -0.069115 | -0.068646 | 0.031514 | 0.031228 | 1.468673 | 1.976866 | 0.013285 | 0.045391 |
| 7 | 0.0010 | 1.000000e-05 | 0.00017 | 2.236258 | 0.041502 | 0.041112 | 0.940380 | -0.028967 | -0.028916 | -0.069115 | -0.069002 | 0.031514 | 0.031211 | 1.123422 | 2.084656 | 0.028567 | 0.088878 |
| 8 | 0.0001 | 1.000000e-05 | 0.00017 | 6.329965 | 0.041502 | 0.041142 | 0.868566 | -0.028967 | -0.028950 | -0.069115 | -0.069369 | 0.031514 | 0.031210 | 0.849649 | 2.061880 | 0.044613 | 0.178204 |
plot_lq_tradeoff(df_lq, title="Baseline path: LQ sweep (risk vs turnover)")
benchmark = df_lq.iloc[3].to_dict()
benchmark
{'lambda_u': 0.001,
'lambda_h': 1e-06,
'Hx_hedge_sensitivity': 0.00016952456054489685,
'K_max_abs': 6.406062071472268,
'std_unhedged': 0.041502314399371805,
'std_hedged': 0.037672283687605466,
'std_reduction_%': 9.228475007225889,
'p05_unhedged': -0.028967123151622646,
'p05_hedged': -0.028228493090004823,
'min_unhedged': -0.06911495202798612,
'min_hedged': -0.06457091968364496,
'mean_unhedged': 0.0315137176683032,
'mean_hedged': 0.028723195974091707,
'mean_abs_h': 14.301090737582067,
'max_abs_h': 19.29354216567544,
'mean_abs_u': 0.13094791972470568,
'max_abs_u': 0.44557190322919743}
Stress testing protocol¶
We evaluate policies under:
- baseline (historical smoothed factors),
- parallel shock (+200bp level),
- high-vol regime (scaled factor innovations),
- steepener scenario.
For each scenario we compare:
- unhedged NII path,
- LQ-hedged NII path,
- and later RL-hedged NII path.
Metrics include volatility and downside (e.g., 5% quantile), plus turnover/inventory proxies.
def eval_policy_on_path(X_path, lambda_u, lambda_h, label):
H_x, Q_s, R = build_Hx_QR_from_nii(
cm, X_ref,
A_notional, L_notional, tau_A, tau_L, dt,
alpha_nii=alpha_nii,
lambda_h=lambda_h,
lambda_u=lambda_u
)
res = run_lq_hedge_on_path(
cm, X_path, Phi_hat, H_x, Q_s, R,
A_notional, L_notional, tau_A, tau_L, dt
)
metrics = summarize_paths(res["NII_unhedged"], res["NII_hedged"], res["h"], res["u"])
return {
"scenario": label,
"lambda_u": float(lambda_u),
"lambda_h": float(lambda_h),
**metrics
}, res
# Choose lambdas (use preferred benchmark)
lambda_u_star = float(benchmark["lambda_u"])
lambda_h_star = float(benchmark["lambda_h"])
rows = []
res_store = {}
for label, X_path in [
("baseline (smoothed)", X_smooth),
("stress: +200bp parallel", X_parallel),
("stress: bear steepener", X_steepen),
("stress: high vol x3", X_highvol),
]:
row, res = eval_policy_on_path(X_path, lambda_u_star, lambda_h_star, label)
rows.append(row)
res_store[label] = res
df_stress = pd.DataFrame(rows)
df_stress
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\2975159280.py:234: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) u_t = -float(K @ x_t)
| scenario | lambda_u | lambda_h | std_unhedged | std_hedged | std_reduction_% | p05_unhedged | p05_hedged | min_unhedged | min_hedged | mean_unhedged | mean_hedged | mean_abs_h | max_abs_h | mean_abs_u | max_abs_u | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | baseline (smoothed) | 0.001 | 0.000001 | 0.041502 | 0.037672 | 9.228475 | -0.028967 | -0.028228 | -0.069115 | -0.064571 | 0.031514 | 0.028723 | 14.301091 | 19.293542 | 0.130948 | 0.445572 |
| 1 | stress: +200bp parallel | 0.001 | 0.000001 | 0.011522 | 0.010117 | 12.199119 | 0.021739 | 0.020354 | 0.021490 | 0.020128 | 0.036319 | 0.033284 | 14.585011 | 17.558518 | 0.050177 | 0.539936 |
| 2 | stress: bear steepener | 0.001 | 0.000001 | 0.015246 | 0.013545 | 11.152510 | 0.021091 | 0.019772 | 0.020990 | 0.019680 | 0.038386 | 0.035136 | 14.777920 | 18.258270 | 0.053618 | 0.603996 |
| 3 | stress: high vol x3 | 0.001 | 0.000001 | 0.031582 | 0.028799 | 8.811852 | -0.010008 | -0.010316 | -0.058414 | -0.054684 | 0.040042 | 0.036726 | 13.971398 | 17.529290 | 0.090683 | 0.484180 |
Policy evaluation on identical scenarios¶
To make the comparison fair, policies are evaluated on the same underlying factor/yield paths.
This isolates the policy effect:
- differences in NII are due to hedging decisions,
- not due to different simulated market paths.
This section produces the main “headline” plots:
- NII trajectories under stress,
- hedge inventory paths,
- and summary metrics.
def plot_paths(res, title):
plt.figure(figsize=(9,4))
plt.plot(res["NII_unhedged"], label="Unhedged")
plt.plot(res["NII_hedged"], label="LQ hedged")
plt.title(title)
plt.xlabel("t (months)")
plt.ylabel("NII")
plt.legend()
plt.tight_layout()
plt.show()
plt.figure(figsize=(9,3))
plt.plot(res["h"], label="h_t")
plt.title(title + " — hedge notional")
plt.xlabel("t (months)")
plt.ylabel("h_t")
plt.legend()
plt.tight_layout()
plt.show()
plot_paths(res_store["stress: +200bp parallel"], "LQ benchmark under +200bp parallel shock")
plot_paths(res_store["stress: bear steepener"], "LQ benchmark under bear steepener")
plot_paths(res_store["stress: high vol x3"], "LQ benchmark under high-vol regime")
Part IV — Reinforcement Learning¶
Beyond Quadratic Costs: L1 Transaction Costs and the Transition to Reinforcement Learning¶
The linear–quadratic (LQ) control framework provides a powerful and interpretable benchmark for dynamic hedging when costs are quadratic. However, real-world hedging problems often involve non-quadratic frictions, most notably transaction costs that scale linearly with trade size.
This section explains how introducing L1 costs fundamentally changes the control problem and motivates the use of reinforcement learning.
Economic Motivation for L1 Transaction Costs¶
Quadratic trading costs imply that:
- small trades are almost free,
- frequent rebalancing is optimal,
- hedge adjustments are smooth and continuous.
In practice, interest rate hedging instruments (IRS, FRA, swaps) are subject to:
- bid–ask spreads,
- brokerage fees,
- balance-sheet and operational costs.
These costs are better approximated by linear (L1) penalties: $ \text{Transaction cost at time } t \;\propto\; |u_t| $
Economically, this means:
- each trade incurs a fixed marginal cost,
- small trades are not necessarily cheap,
- inactivity can be optimal over wide regions of the state space.
The Control Problem with L1 Costs¶
Replacing the quadratic trading penalty with an L1 penalty leads to the objective: $ \min_{u_t} \mathbb{E} \sum_{t=0}^{\infty} \left( \text{NII}_{t+1}^2 + \lambda_h h_t^2 + \kappa_u |u_t| \right) $
Key differences from the LQ case:
- the cost function is non-differentiable at $u_t = 0$,
- the value function is no longer quadratic,
- the optimal policy is no longer linear in the state.
As a result, classical LQ theory no longer applies.
What Breaks in Classical Optimal Control¶
Under L1 costs:
- the Riccati equation cannot be used,
- there is no closed-form optimal feedback matrix,
- certainty equivalence fails.
Most importantly:
The optimal policy develops endogenous “no-trade regions.”
That is, there exist states where: $ u_t^\star = 0 $ even though the hedge is imperfect.
This behavior is well known in impulse control and inventory management problems, but it cannot be represented by linear feedback rules.
State-Space Dynamics Remain Valid¶
Crucially, introducing L1 costs does not invalidate the state-space model.
The dynamics remain: $ \mathbf{s}_{t+1} = A \mathbf{s}_t + B u_t + \boldsymbol{\xi}_{t+1} $
where:
- $\mathbf{s}_t = (L_t, S_t, C_t, h_t)$,
- yield-curve factors evolve exogenously under AFNS dynamics,
- control affects only the hedge inventory.
What changes is not the system, but the optimization problem defined on top of it.
Why Reinforcement Learning Is Appropriate¶
Reinforcement learning (RL) solves dynamic decision problems by:
- interacting with the environment,
- learning value functions or policies directly,
- without requiring smoothness or quadratic structure.
In this project, RL is applied to:
- the same AFNS-based state-space system,
- the same NII definition,
- but with an objective that includes L1 transaction costs.
This allows the agent to:
- learn sparse trading policies,
- internalize fixed trading frictions,
- and optimally balance NII risk against trading intensity.
Relationship Between LQ Control and RL¶
The L1-RL formulation can be viewed as a generalization of LQ control:
- When transaction costs are quadratic, the optimal RL policy converges toward the LQ solution.
- When costs are linear, RL departs from linear feedback and learns non-smooth, state-dependent rules.
Thus:
RL does not replace classical control. It extends it to settings where classical assumptions break down.
Role in This Project¶
The comparison between LQ control and RL serves a clear purpose:
- LQ control defines the optimal benchmark under idealized quadratic costs.
- RL with L1 costs captures realistic trading frictions and balance-sheet considerations.
- The performance gap between the two highlights the economic impact of non-quadratic costs.
This framework makes it possible to assess when and why more flexible decision rules are required in interest rate risk management.
Conceptual Summary¶
- The AFNS model provides a coherent state-space environment.
- LQ control is optimal under quadratic costs and serves as a theoretical benchmark.
- L1 transaction costs break the assumptions underlying LQ theory.
- Reinforcement learning naturally handles non-smooth objectives and sparse actions.
- Comparing LQ and RL policies reveals the economic consequences of realistic trading frictions.
In this sense, reinforcement learning appears not as a black-box alternative, but as the appropriate solution once the structure of the control problem changes.
import gymnasium as gym
from gymnasium import spaces
class IrrbbNiiHedgeEnv(gym.Env):
"""
RL environment for IRRBB-style NII hedging using a FRA hedge.
State: [L, S, C, h] (optionally centered around X_ref)
Action: u = Δh (hedge adjustment), continuous and bounded
Dynamics:
X_{t+1} = c + Phi X_t + eps_t, eps_t ~ N(0, Sigma)
h_{t+1} = h_t + u_t
Reward (to maximize):
r_t = - [ NII_{t+1}^2 + lambda_h*h_t^2 + lambda_u*u_t^2 ]
"""
metadata = {"render_modes": []}
def __init__(self,
cm,
c_hat, Phi_hat, Sigma_hat, X_ref,
A_notional=100.0, L_notional=100.0,
tau_A=3.0, tau_L=1.0, dt=1/12,
lambda_u=1e-3, lambda_h=1e-6,
action_max=1.0,
episode_len=240,
center_state=True,
seed=123):
super().__init__()
self.cm = cm
self.c = np.asarray(c_hat, float)
self.Phi = np.asarray(Phi_hat, float)
self.Sigma = np.asarray(Sigma_hat, float)
self.X_ref = np.asarray(X_ref, float)
self.A_notional = float(A_notional)
self.L_notional = float(L_notional)
self.tau_A = float(tau_A)
self.tau_L = float(tau_L)
self.dt = float(dt)
self.lambda_u = float(lambda_u)
self.lambda_h = float(lambda_h)
self.action_max = float(action_max)
self.episode_len = int(episode_len)
self.center_state = bool(center_state)
self.rng = np.random.default_rng(seed)
# Observation: 4D continuous
obs_high = np.full((4,), np.inf, dtype=np.float32)
self.observation_space = spaces.Box(low=-obs_high, high=obs_high, dtype=np.float32)
# Action: 1D continuous Δh bounded
self.action_space = spaces.Box(low=-self.action_max, high=self.action_max, shape=(1,), dtype=np.float32)
self.t = 0
self.X = None
self.h = None
# --- AFNS helpers ---
def _y(self, X, tau):
taus = np.array([tau], dtype=float)
return float(afns_yields_from_factors(X, taus, self.cm.lam, self.cm.sig1, self.cm.sig2, self.cm.sig3)[0])
def _fwd_cc(self, X, tau1, tau2):
y1 = self._y(X, tau1)
y2 = self._y(X, tau2)
return (tau2 * y2 - tau1 * y1) / (tau2 - tau1)
def _nii_one_step(self, X_t, X_next, h_t):
# Base repricing NII (simplified)
yA = self._y(X_t, self.tau_A)
yL = self._y(X_t, self.tau_L)
# FRA fixed rate: forward from tau_L to tau_L+dt
K_t = self._fwd_cc(X_t, self.tau_L, self.tau_L + self.dt)
y_float_next = self._y(X_next, self.tau_L)
return (self.A_notional * yA * self.dt
- self.L_notional * yL * self.dt
+ h_t * (K_t - y_float_next) * self.dt)
def _obs(self):
x = np.array([self.X[0], self.X[1], self.X[2], self.h], dtype=np.float32)
if self.center_state:
x[:3] = x[:3] - self.X_ref.astype(np.float32)
return x
def reset(self, *, seed=None, options=None):
if seed is not None:
self.rng = np.random.default_rng(seed)
self.t = 0
# Start near reference with small randomization
X0 = self.X_ref + self.rng.multivariate_normal(np.zeros(3), 0.1 * self.Sigma)
self.X = np.asarray(X0, float)
self.h = 0.0
return self._obs(), {}
def step(self, action):
action = np.asarray(action, dtype=float)
u = float(np.clip(action[0], -self.action_max, self.action_max))
# Next factors
eps = self.rng.multivariate_normal(np.zeros(3), self.Sigma)
X_next = self.c + self.Phi @ self.X + eps
# NII uses h_t (pre-trade) in our convention (common in discrete-time setups)
nii = self._nii_one_step(self.X, X_next, self.h)
# Update hedge
h_next = self.h + u
# Reward: negative quadratic "cost"
reward = -(nii**2 + self.lambda_h * (self.h**2) + self.lambda_u * (u**2))
# Transition
self.X = X_next
self.h = h_next
self.t += 1
terminated = False
truncated = (self.t >= self.episode_len)
info = {"nii": nii, "h": self.h, "u": u}
return self._obs(), float(reward), terminated, truncated, info
In the baseline linear–quadratic setting, classical LQ control has a structural advantage:
- linear dynamics,
- quadratic objective,
- continuous actions.
So it is expected to be near-optimal.
RL becomes compelling when we introduce realistic features that break LQ assumptions, such as:
- non-quadratic transaction costs (L1),
- no-trade regions / discrete trading,
- regime switching or nonlinearities,
- tail-risk objectives.
We start by implementing an RL agent in the same environment, then extend the objective to L1 costs where RL has a fair advantage.
from stable_baselines3 import SAC
from stable_baselines3.common.env_util import make_vec_env
np.random.seed(123)
# Fit VAR(1) once
c_hat, Phi_hat, Sigma_hat = fit_var1(X_smooth)
X_ref = X_smooth.mean(axis=0)
# Use the same lambdas previously selected for LQ benchmark
lambda_u_rl = float(lambda_u_star)
lambda_h_rl = float(lambda_h_star)
def make_env():
return IrrbbNiiHedgeEnv(
cm=cm,
c_hat=c_hat, Phi_hat=Phi_hat, Sigma_hat=Sigma_hat, X_ref=X_ref,
A_notional=100.0, L_notional=100.0,
tau_A=3.0, tau_L=1.0, dt=1/12,
lambda_u=lambda_u_rl, lambda_h=lambda_h_rl,
action_max=50,
episode_len=240, # 20 years monthly
center_state=True,
seed=123
)
vec_env = make_vec_env(make_env, n_envs=8) # parallel rollouts
model = SAC(
"MlpPolicy",
vec_env,
verbose=1,
learning_rate=3e-4,
batch_size=256,
buffer_size=200_000,
train_freq=1,
gradient_steps=1,
gamma=0.99,
tau=0.005,
)
model.learn(total_timesteps=300_000)
# Save if you want
model.save("sac_irrbb_nii_hedge")
Using cpu device --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -364 | | time/ | | | episodes | 4 | | fps | 160 | | time_elapsed | 11 | | total_timesteps | 1920 | | train/ | | | actor_loss | 13.9 | | critic_loss | 7.2 | | ent_coef | 1.03 | | ent_coef_loss | 0.0366 | | learning_rate | 0.0003 | | n_updates | 227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -364 | | time/ | | | episodes | 8 | | fps | 159 | | time_elapsed | 12 | | total_timesteps | 1920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -328 | | time/ | | | episodes | 12 | | fps | 149 | | time_elapsed | 25 | | total_timesteps | 3840 | | train/ | | | actor_loss | 13.3 | | critic_loss | 8.89 | | ent_coef | 1.05 | | ent_coef_loss | 0.0462 | | learning_rate | 0.0003 | | n_updates | 467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -328 | | time/ | | | episodes | 16 | | fps | 149 | | time_elapsed | 25 | | total_timesteps | 3840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -287 | | time/ | | | episodes | 20 | | fps | 149 | | time_elapsed | 38 | | total_timesteps | 5760 | | train/ | | | actor_loss | 11.4 | | critic_loss | 5.19 | | ent_coef | 0.968 | | ent_coef_loss | -0.0485 | | learning_rate | 0.0003 | | n_updates | 707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -287 | | time/ | | | episodes | 24 | | fps | 149 | | time_elapsed | 38 | | total_timesteps | 5760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -265 | | time/ | | | episodes | 28 | | fps | 151 | | time_elapsed | 50 | | total_timesteps | 7680 | | train/ | | | actor_loss | 9.16 | | critic_loss | 3.4 | | ent_coef | 0.893 | | ent_coef_loss | -0.166 | | learning_rate | 0.0003 | | n_updates | 947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -265 | | time/ | | | episodes | 32 | | fps | 151 | | time_elapsed | 50 | | total_timesteps | 7680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -249 | | time/ | | | episodes | 36 | | fps | 151 | | time_elapsed | 63 | | total_timesteps | 9600 | | train/ | | | actor_loss | 8.58 | | critic_loss | 1.87 | | ent_coef | 0.83 | | ent_coef_loss | -0.261 | | learning_rate | 0.0003 | | n_updates | 1187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -249 | | time/ | | | episodes | 40 | | fps | 151 | | time_elapsed | 63 | | total_timesteps | 9600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -237 | | time/ | | | episodes | 44 | | fps | 149 | | time_elapsed | 77 | | total_timesteps | 11520 | | train/ | | | actor_loss | 6.35 | | critic_loss | 0.766 | | ent_coef | 0.772 | | ent_coef_loss | -0.313 | | learning_rate | 0.0003 | | n_updates | 1427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -237 | | time/ | | | episodes | 48 | | fps | 149 | | time_elapsed | 77 | | total_timesteps | 11520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -223 | | time/ | | | episodes | 52 | | fps | 147 | | time_elapsed | 91 | | total_timesteps | 13440 | | train/ | | | actor_loss | 7.99 | | critic_loss | 1.25 | | ent_coef | 0.722 | | ent_coef_loss | -0.44 | | learning_rate | 0.0003 | | n_updates | 1667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -223 | | time/ | | | episodes | 56 | | fps | 147 | | time_elapsed | 91 | | total_timesteps | 13440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -209 | | time/ | | | episodes | 60 | | fps | 146 | | time_elapsed | 105 | | total_timesteps | 15360 | | train/ | | | actor_loss | 6.91 | | critic_loss | 0.428 | | ent_coef | 0.68 | | ent_coef_loss | -0.337 | | learning_rate | 0.0003 | | n_updates | 1907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -209 | | time/ | | | episodes | 64 | | fps | 146 | | time_elapsed | 105 | | total_timesteps | 15360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -194 | | time/ | | | episodes | 68 | | fps | 146 | | time_elapsed | 117 | | total_timesteps | 17280 | | train/ | | | actor_loss | 4.66 | | critic_loss | 0.27 | | ent_coef | 0.64 | | ent_coef_loss | -0.53 | | learning_rate | 0.0003 | | n_updates | 2147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -194 | | time/ | | | episodes | 72 | | fps | 146 | | time_elapsed | 117 | | total_timesteps | 17280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -181 | | time/ | | | episodes | 76 | | fps | 147 | | time_elapsed | 130 | | total_timesteps | 19200 | | train/ | | | actor_loss | 6.84 | | critic_loss | 0.634 | | ent_coef | 0.604 | | ent_coef_loss | -0.172 | | learning_rate | 0.0003 | | n_updates | 2387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -181 | | time/ | | | episodes | 80 | | fps | 147 | | time_elapsed | 130 | | total_timesteps | 19200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -169 | | time/ | | | episodes | 84 | | fps | 147 | | time_elapsed | 143 | | total_timesteps | 21120 | | train/ | | | actor_loss | 5.26 | | critic_loss | 0.668 | | ent_coef | 0.571 | | ent_coef_loss | -0.357 | | learning_rate | 0.0003 | | n_updates | 2627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -169 | | time/ | | | episodes | 88 | | fps | 147 | | time_elapsed | 143 | | total_timesteps | 21120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -159 | | time/ | | | episodes | 92 | | fps | 146 | | time_elapsed | 156 | | total_timesteps | 23040 | | train/ | | | actor_loss | 4.84 | | critic_loss | 0.262 | | ent_coef | 0.537 | | ent_coef_loss | -0.665 | | learning_rate | 0.0003 | | n_updates | 2867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -159 | | time/ | | | episodes | 96 | | fps | 146 | | time_elapsed | 156 | | total_timesteps | 23040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -143 | | time/ | | | episodes | 100 | | fps | 146 | | time_elapsed | 170 | | total_timesteps | 24960 | | train/ | | | actor_loss | 5.05 | | critic_loss | 0.324 | | ent_coef | 0.507 | | ent_coef_loss | -0.545 | | learning_rate | 0.0003 | | n_updates | 3107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -143 | | time/ | | | episodes | 104 | | fps | 146 | | time_elapsed | 170 | | total_timesteps | 24960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -119 | | time/ | | | episodes | 108 | | fps | 146 | | time_elapsed | 182 | | total_timesteps | 26880 | | train/ | | | actor_loss | 5.13 | | critic_loss | 0.181 | | ent_coef | 0.476 | | ent_coef_loss | -0.771 | | learning_rate | 0.0003 | | n_updates | 3347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -119 | | time/ | | | episodes | 112 | | fps | 146 | | time_elapsed | 182 | | total_timesteps | 26880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -103 | | time/ | | | episodes | 116 | | fps | 147 | | time_elapsed | 195 | | total_timesteps | 28800 | | train/ | | | actor_loss | 5.3 | | critic_loss | 1.33 | | ent_coef | 0.447 | | ent_coef_loss | -0.688 | | learning_rate | 0.0003 | | n_updates | 3587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -103 | | time/ | | | episodes | 120 | | fps | 147 | | time_elapsed | 195 | | total_timesteps | 28800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -90.7 | | time/ | | | episodes | 124 | | fps | 147 | | time_elapsed | 208 | | total_timesteps | 30720 | | train/ | | | actor_loss | 5.13 | | critic_loss | 0.232 | | ent_coef | 0.418 | | ent_coef_loss | -0.789 | | learning_rate | 0.0003 | | n_updates | 3827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -90.7 | | time/ | | | episodes | 128 | | fps | 147 | | time_elapsed | 208 | | total_timesteps | 30720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -79.1 | | time/ | | | episodes | 132 | | fps | 146 | | time_elapsed | 222 | | total_timesteps | 32640 | | train/ | | | actor_loss | 4.55 | | critic_loss | 0.394 | | ent_coef | 0.392 | | ent_coef_loss | -0.777 | | learning_rate | 0.0003 | | n_updates | 4067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -79.1 | | time/ | | | episodes | 136 | | fps | 146 | | time_elapsed | 222 | | total_timesteps | 32640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -67.7 | | time/ | | | episodes | 140 | | fps | 147 | | time_elapsed | 234 | | total_timesteps | 34560 | | train/ | | | actor_loss | 6.2 | | critic_loss | 0.201 | | ent_coef | 0.366 | | ent_coef_loss | -0.156 | | learning_rate | 0.0003 | | n_updates | 4307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -67.7 | | time/ | | | episodes | 144 | | fps | 147 | | time_elapsed | 234 | | total_timesteps | 34560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -58.2 | | time/ | | | episodes | 148 | | fps | 146 | | time_elapsed | 248 | | total_timesteps | 36480 | | train/ | | | actor_loss | 6.65 | | critic_loss | 0.47 | | ent_coef | 0.344 | | ent_coef_loss | -0.55 | | learning_rate | 0.0003 | | n_updates | 4547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -58.2 | | time/ | | | episodes | 152 | | fps | 146 | | time_elapsed | 249 | | total_timesteps | 36480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -51.5 | | time/ | | | episodes | 156 | | fps | 146 | | time_elapsed | 262 | | total_timesteps | 38400 | | train/ | | | actor_loss | 6.42 | | critic_loss | 0.444 | | ent_coef | 0.322 | | ent_coef_loss | -1.1 | | learning_rate | 0.0003 | | n_updates | 4787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -51.5 | | time/ | | | episodes | 160 | | fps | 146 | | time_elapsed | 262 | | total_timesteps | 38400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -46.8 | | time/ | | | episodes | 164 | | fps | 145 | | time_elapsed | 276 | | total_timesteps | 40320 | | train/ | | | actor_loss | 5.39 | | critic_loss | 0.0763 | | ent_coef | 0.303 | | ent_coef_loss | -1.02 | | learning_rate | 0.0003 | | n_updates | 5027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -46.8 | | time/ | | | episodes | 168 | | fps | 145 | | time_elapsed | 276 | | total_timesteps | 40320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -44.4 | | time/ | | | episodes | 172 | | fps | 145 | | time_elapsed | 290 | | total_timesteps | 42240 | | train/ | | | actor_loss | 5.65 | | critic_loss | 0.192 | | ent_coef | 0.284 | | ent_coef_loss | -1.1 | | learning_rate | 0.0003 | | n_updates | 5267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -44.4 | | time/ | | | episodes | 176 | | fps | 145 | | time_elapsed | 290 | | total_timesteps | 42240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -42.6 | | time/ | | | episodes | 180 | | fps | 144 | | time_elapsed | 304 | | total_timesteps | 44160 | | train/ | | | actor_loss | 5.64 | | critic_loss | 0.102 | | ent_coef | 0.266 | | ent_coef_loss | -1.18 | | learning_rate | 0.0003 | | n_updates | 5507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -42.6 | | time/ | | | episodes | 184 | | fps | 144 | | time_elapsed | 304 | | total_timesteps | 44160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -41.2 | | time/ | | | episodes | 188 | | fps | 144 | | time_elapsed | 318 | | total_timesteps | 46080 | | train/ | | | actor_loss | 6.13 | | critic_loss | 0.219 | | ent_coef | 0.249 | | ent_coef_loss | -0.913 | | learning_rate | 0.0003 | | n_updates | 5747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -41.2 | | time/ | | | episodes | 192 | | fps | 144 | | time_elapsed | 318 | | total_timesteps | 46080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -39.8 | | time/ | | | episodes | 196 | | fps | 144 | | time_elapsed | 332 | | total_timesteps | 48000 | | train/ | | | actor_loss | 6.32 | | critic_loss | 0.114 | | ent_coef | 0.234 | | ent_coef_loss | -1.18 | | learning_rate | 0.0003 | | n_updates | 5987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -39.8 | | time/ | | | episodes | 200 | | fps | 144 | | time_elapsed | 332 | | total_timesteps | 48000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -38.4 | | time/ | | | episodes | 204 | | fps | 144 | | time_elapsed | 346 | | total_timesteps | 49920 | | train/ | | | actor_loss | 5.89 | | critic_loss | 0.0893 | | ent_coef | 0.219 | | ent_coef_loss | -1.14 | | learning_rate | 0.0003 | | n_updates | 6227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -38.4 | | time/ | | | episodes | 208 | | fps | 144 | | time_elapsed | 346 | | total_timesteps | 49920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -36.9 | | time/ | | | episodes | 212 | | fps | 143 | | time_elapsed | 360 | | total_timesteps | 51840 | | train/ | | | actor_loss | 5.78 | | critic_loss | 0.0792 | | ent_coef | 0.205 | | ent_coef_loss | -1.21 | | learning_rate | 0.0003 | | n_updates | 6467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -36.9 | | time/ | | | episodes | 216 | | fps | 143 | | time_elapsed | 360 | | total_timesteps | 51840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -35.2 | | time/ | | | episodes | 220 | | fps | 143 | | time_elapsed | 373 | | total_timesteps | 53760 | | train/ | | | actor_loss | 6.01 | | critic_loss | 0.0382 | | ent_coef | 0.192 | | ent_coef_loss | -0.709 | | learning_rate | 0.0003 | | n_updates | 6707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -35.2 | | time/ | | | episodes | 224 | | fps | 143 | | time_elapsed | 373 | | total_timesteps | 53760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -33.9 | | time/ | | | episodes | 228 | | fps | 143 | | time_elapsed | 387 | | total_timesteps | 55680 | | train/ | | | actor_loss | 7.14 | | critic_loss | 0.104 | | ent_coef | 0.181 | | ent_coef_loss | 0.248 | | learning_rate | 0.0003 | | n_updates | 6947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -33.9 | | time/ | | | episodes | 232 | | fps | 143 | | time_elapsed | 387 | | total_timesteps | 55680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -32.4 | | time/ | | | episodes | 236 | | fps | 143 | | time_elapsed | 401 | | total_timesteps | 57600 | | train/ | | | actor_loss | 6.59 | | critic_loss | 0.334 | | ent_coef | 0.17 | | ent_coef_loss | -1.38 | | learning_rate | 0.0003 | | n_updates | 7187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -32.4 | | time/ | | | episodes | 240 | | fps | 143 | | time_elapsed | 401 | | total_timesteps | 57600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -30.8 | | time/ | | | episodes | 244 | | fps | 143 | | time_elapsed | 415 | | total_timesteps | 59520 | | train/ | | | actor_loss | 6.44 | | critic_loss | 0.0724 | | ent_coef | 0.16 | | ent_coef_loss | -1.05 | | learning_rate | 0.0003 | | n_updates | 7427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -30.8 | | time/ | | | episodes | 248 | | fps | 143 | | time_elapsed | 415 | | total_timesteps | 59520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -29.2 | | time/ | | | episodes | 252 | | fps | 143 | | time_elapsed | 429 | | total_timesteps | 61440 | | train/ | | | actor_loss | 6.39 | | critic_loss | 0.0203 | | ent_coef | 0.151 | | ent_coef_loss | -1.17 | | learning_rate | 0.0003 | | n_updates | 7667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -29.2 | | time/ | | | episodes | 256 | | fps | 143 | | time_elapsed | 429 | | total_timesteps | 61440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -27.7 | | time/ | | | episodes | 260 | | fps | 142 | | time_elapsed | 443 | | total_timesteps | 63360 | | train/ | | | actor_loss | 6.2 | | critic_loss | 0.0613 | | ent_coef | 0.147 | | ent_coef_loss | -1.09 | | learning_rate | 0.0003 | | n_updates | 7907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -27.7 | | time/ | | | episodes | 264 | | fps | 142 | | time_elapsed | 443 | | total_timesteps | 63360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -26.3 | | time/ | | | episodes | 268 | | fps | 142 | | time_elapsed | 457 | | total_timesteps | 65280 | | train/ | | | actor_loss | 6.47 | | critic_loss | 0.0402 | | ent_coef | 0.139 | | ent_coef_loss | -0.771 | | learning_rate | 0.0003 | | n_updates | 8147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -26.3 | | time/ | | | episodes | 272 | | fps | 142 | | time_elapsed | 457 | | total_timesteps | 65280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -24.9 | | time/ | | | episodes | 276 | | fps | 142 | | time_elapsed | 471 | | total_timesteps | 67200 | | train/ | | | actor_loss | 6.97 | | critic_loss | 0.0437 | | ent_coef | 0.131 | | ent_coef_loss | -0.15 | | learning_rate | 0.0003 | | n_updates | 8387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -24.9 | | time/ | | | episodes | 280 | | fps | 142 | | time_elapsed | 471 | | total_timesteps | 67200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -23.5 | | time/ | | | episodes | 284 | | fps | 142 | | time_elapsed | 485 | | total_timesteps | 69120 | | train/ | | | actor_loss | 6.79 | | critic_loss | 0.0115 | | ent_coef | 0.124 | | ent_coef_loss | -0.108 | | learning_rate | 0.0003 | | n_updates | 8627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -23.5 | | time/ | | | episodes | 288 | | fps | 142 | | time_elapsed | 485 | | total_timesteps | 69120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -22.2 | | time/ | | | episodes | 292 | | fps | 142 | | time_elapsed | 499 | | total_timesteps | 71040 | | train/ | | | actor_loss | 6.79 | | critic_loss | 0.0209 | | ent_coef | 0.118 | | ent_coef_loss | -1.2 | | learning_rate | 0.0003 | | n_updates | 8867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -22.2 | | time/ | | | episodes | 296 | | fps | 142 | | time_elapsed | 499 | | total_timesteps | 71040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -20.9 | | time/ | | | episodes | 300 | | fps | 142 | | time_elapsed | 513 | | total_timesteps | 72960 | | train/ | | | actor_loss | 7.05 | | critic_loss | 0.0253 | | ent_coef | 0.111 | | ent_coef_loss | -0.654 | | learning_rate | 0.0003 | | n_updates | 9107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -20.9 | | time/ | | | episodes | 304 | | fps | 142 | | time_elapsed | 513 | | total_timesteps | 72960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -19.7 | | time/ | | | episodes | 308 | | fps | 142 | | time_elapsed | 527 | | total_timesteps | 74880 | | train/ | | | actor_loss | 6.78 | | critic_loss | 0.0151 | | ent_coef | 0.106 | | ent_coef_loss | -0.997 | | learning_rate | 0.0003 | | n_updates | 9347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -19.7 | | time/ | | | episodes | 312 | | fps | 142 | | time_elapsed | 527 | | total_timesteps | 74880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -18.7 | | time/ | | | episodes | 316 | | fps | 141 | | time_elapsed | 540 | | total_timesteps | 76800 | | train/ | | | actor_loss | 7.12 | | critic_loss | 0.122 | | ent_coef | 0.101 | | ent_coef_loss | -0.565 | | learning_rate | 0.0003 | | n_updates | 9587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -18.7 | | time/ | | | episodes | 320 | | fps | 141 | | time_elapsed | 540 | | total_timesteps | 76800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -17.7 | | time/ | | | episodes | 324 | | fps | 141 | | time_elapsed | 555 | | total_timesteps | 78720 | | train/ | | | actor_loss | 7 | | critic_loss | 0.0235 | | ent_coef | 0.096 | | ent_coef_loss | -0.577 | | learning_rate | 0.0003 | | n_updates | 9827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -17.7 | | time/ | | | episodes | 328 | | fps | 141 | | time_elapsed | 555 | | total_timesteps | 78720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -16.9 | | time/ | | | episodes | 332 | | fps | 141 | | time_elapsed | 569 | | total_timesteps | 80640 | | train/ | | | actor_loss | 7.03 | | critic_loss | 0.0226 | | ent_coef | 0.0925 | | ent_coef_loss | -0.775 | | learning_rate | 0.0003 | | n_updates | 10067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -16.9 | | time/ | | | episodes | 336 | | fps | 141 | | time_elapsed | 569 | | total_timesteps | 80640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -16 | | time/ | | | episodes | 340 | | fps | 141 | | time_elapsed | 584 | | total_timesteps | 82560 | | train/ | | | actor_loss | 7.2 | | critic_loss | 0.0333 | | ent_coef | 0.0903 | | ent_coef_loss | 0.455 | | learning_rate | 0.0003 | | n_updates | 10307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -16 | | time/ | | | episodes | 344 | | fps | 141 | | time_elapsed | 584 | | total_timesteps | 82560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -15.2 | | time/ | | | episodes | 348 | | fps | 141 | | time_elapsed | 597 | | total_timesteps | 84480 | | train/ | | | actor_loss | 7.24 | | critic_loss | 0.112 | | ent_coef | 0.0884 | | ent_coef_loss | 0.627 | | learning_rate | 0.0003 | | n_updates | 10547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -15.2 | | time/ | | | episodes | 352 | | fps | 141 | | time_elapsed | 597 | | total_timesteps | 84480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -14.6 | | time/ | | | episodes | 356 | | fps | 141 | | time_elapsed | 610 | | total_timesteps | 86400 | | train/ | | | actor_loss | 7.05 | | critic_loss | 0.00427 | | ent_coef | 0.0848 | | ent_coef_loss | -0.756 | | learning_rate | 0.0003 | | n_updates | 10787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -14.6 | | time/ | | | episodes | 360 | | fps | 141 | | time_elapsed | 610 | | total_timesteps | 86400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -14 | | time/ | | | episodes | 364 | | fps | 141 | | time_elapsed | 623 | | total_timesteps | 88320 | | train/ | | | actor_loss | 7.23 | | critic_loss | 0.0206 | | ent_coef | 0.0808 | | ent_coef_loss | 0.518 | | learning_rate | 0.0003 | | n_updates | 11027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -14 | | time/ | | | episodes | 368 | | fps | 141 | | time_elapsed | 623 | | total_timesteps | 88320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -13.3 | | time/ | | | episodes | 372 | | fps | 141 | | time_elapsed | 636 | | total_timesteps | 90240 | | train/ | | | actor_loss | 7.26 | | critic_loss | 0.0166 | | ent_coef | 0.0781 | | ent_coef_loss | -0.659 | | learning_rate | 0.0003 | | n_updates | 11267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -13.3 | | time/ | | | episodes | 376 | | fps | 141 | | time_elapsed | 636 | | total_timesteps | 90240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -12.7 | | time/ | | | episodes | 380 | | fps | 141 | | time_elapsed | 649 | | total_timesteps | 92160 | | train/ | | | actor_loss | 7.15 | | critic_loss | 0.0058 | | ent_coef | 0.0742 | | ent_coef_loss | -0.544 | | learning_rate | 0.0003 | | n_updates | 11507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -12.7 | | time/ | | | episodes | 384 | | fps | 141 | | time_elapsed | 649 | | total_timesteps | 92160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -12.2 | | time/ | | | episodes | 388 | | fps | 142 | | time_elapsed | 661 | | total_timesteps | 94080 | | train/ | | | actor_loss | 7.31 | | critic_loss | 0.00965 | | ent_coef | 0.0722 | | ent_coef_loss | -0.68 | | learning_rate | 0.0003 | | n_updates | 11747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -12.2 | | time/ | | | episodes | 392 | | fps | 142 | | time_elapsed | 661 | | total_timesteps | 94080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -11.6 | | time/ | | | episodes | 396 | | fps | 142 | | time_elapsed | 675 | | total_timesteps | 96000 | | train/ | | | actor_loss | 7.38 | | critic_loss | 0.0113 | | ent_coef | 0.0705 | | ent_coef_loss | -0.345 | | learning_rate | 0.0003 | | n_updates | 11987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -11.6 | | time/ | | | episodes | 400 | | fps | 142 | | time_elapsed | 675 | | total_timesteps | 96000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -11.2 | | time/ | | | episodes | 404 | | fps | 142 | | time_elapsed | 687 | | total_timesteps | 97920 | | train/ | | | actor_loss | 7.48 | | critic_loss | 0.0488 | | ent_coef | 0.0674 | | ent_coef_loss | -0.0736 | | learning_rate | 0.0003 | | n_updates | 12227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -11.2 | | time/ | | | episodes | 408 | | fps | 142 | | time_elapsed | 687 | | total_timesteps | 97920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -10.8 | | time/ | | | episodes | 412 | | fps | 142 | | time_elapsed | 700 | | total_timesteps | 99840 | | train/ | | | actor_loss | 7.58 | | critic_loss | 0.069 | | ent_coef | 0.0645 | | ent_coef_loss | -0.686 | | learning_rate | 0.0003 | | n_updates | 12467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -10.8 | | time/ | | | episodes | 416 | | fps | 142 | | time_elapsed | 700 | | total_timesteps | 99840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -10.4 | | time/ | | | episodes | 420 | | fps | 142 | | time_elapsed | 714 | | total_timesteps | 101760 | | train/ | | | actor_loss | 7.41 | | critic_loss | 0.0121 | | ent_coef | 0.0643 | | ent_coef_loss | -0.784 | | learning_rate | 0.0003 | | n_updates | 12707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -10.4 | | time/ | | | episodes | 424 | | fps | 142 | | time_elapsed | 714 | | total_timesteps | 101760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -9.98 | | time/ | | | episodes | 428 | | fps | 142 | | time_elapsed | 726 | | total_timesteps | 103680 | | train/ | | | actor_loss | 7.39 | | critic_loss | 0.629 | | ent_coef | 0.0624 | | ent_coef_loss | 0.353 | | learning_rate | 0.0003 | | n_updates | 12947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -9.98 | | time/ | | | episodes | 432 | | fps | 142 | | time_elapsed | 726 | | total_timesteps | 103680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -9.67 | | time/ | | | episodes | 436 | | fps | 142 | | time_elapsed | 739 | | total_timesteps | 105600 | | train/ | | | actor_loss | 7.32 | | critic_loss | 0.00688 | | ent_coef | 0.0611 | | ent_coef_loss | -0.114 | | learning_rate | 0.0003 | | n_updates | 13187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -9.67 | | time/ | | | episodes | 440 | | fps | 142 | | time_elapsed | 739 | | total_timesteps | 105600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -9.26 | | time/ | | | episodes | 444 | | fps | 142 | | time_elapsed | 752 | | total_timesteps | 107520 | | train/ | | | actor_loss | 7.29 | | critic_loss | 0.0128 | | ent_coef | 0.0593 | | ent_coef_loss | 0.132 | | learning_rate | 0.0003 | | n_updates | 13427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -9.26 | | time/ | | | episodes | 448 | | fps | 142 | | time_elapsed | 752 | | total_timesteps | 107520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -8.92 | | time/ | | | episodes | 452 | | fps | 143 | | time_elapsed | 764 | | total_timesteps | 109440 | | train/ | | | actor_loss | 7.33 | | critic_loss | 0.0143 | | ent_coef | 0.0596 | | ent_coef_loss | -0.279 | | learning_rate | 0.0003 | | n_updates | 13667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -8.92 | | time/ | | | episodes | 456 | | fps | 143 | | time_elapsed | 764 | | total_timesteps | 109440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -8.55 | | time/ | | | episodes | 460 | | fps | 143 | | time_elapsed | 776 | | total_timesteps | 111360 | | train/ | | | actor_loss | 7.47 | | critic_loss | 0.0226 | | ent_coef | 0.0572 | | ent_coef_loss | -0.39 | | learning_rate | 0.0003 | | n_updates | 13907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -8.55 | | time/ | | | episodes | 464 | | fps | 143 | | time_elapsed | 776 | | total_timesteps | 111360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -8.28 | | time/ | | | episodes | 468 | | fps | 143 | | time_elapsed | 789 | | total_timesteps | 113280 | | train/ | | | actor_loss | 7.29 | | critic_loss | 0.00283 | | ent_coef | 0.0557 | | ent_coef_loss | -0.129 | | learning_rate | 0.0003 | | n_updates | 14147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -8.28 | | time/ | | | episodes | 472 | | fps | 143 | | time_elapsed | 789 | | total_timesteps | 113280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -8.05 | | time/ | | | episodes | 476 | | fps | 143 | | time_elapsed | 802 | | total_timesteps | 115200 | | train/ | | | actor_loss | 7.58 | | critic_loss | 0.0118 | | ent_coef | 0.0541 | | ent_coef_loss | 0.0287 | | learning_rate | 0.0003 | | n_updates | 14387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -8.05 | | time/ | | | episodes | 480 | | fps | 143 | | time_elapsed | 802 | | total_timesteps | 115200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.82 | | time/ | | | episodes | 484 | | fps | 143 | | time_elapsed | 815 | | total_timesteps | 117120 | | train/ | | | actor_loss | 7.46 | | critic_loss | 0.0132 | | ent_coef | 0.0524 | | ent_coef_loss | -0.226 | | learning_rate | 0.0003 | | n_updates | 14627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.82 | | time/ | | | episodes | 488 | | fps | 143 | | time_elapsed | 815 | | total_timesteps | 117120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.64 | | time/ | | | episodes | 492 | | fps | 143 | | time_elapsed | 827 | | total_timesteps | 119040 | | train/ | | | actor_loss | 7.56 | | critic_loss | 0.148 | | ent_coef | 0.0515 | | ent_coef_loss | -0.0242 | | learning_rate | 0.0003 | | n_updates | 14867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.64 | | time/ | | | episodes | 496 | | fps | 143 | | time_elapsed | 827 | | total_timesteps | 119040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.45 | | time/ | | | episodes | 500 | | fps | 143 | | time_elapsed | 841 | | total_timesteps | 120960 | | train/ | | | actor_loss | 7.47 | | critic_loss | 0.0158 | | ent_coef | 0.0504 | | ent_coef_loss | -0.282 | | learning_rate | 0.0003 | | n_updates | 15107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.45 | | time/ | | | episodes | 504 | | fps | 143 | | time_elapsed | 841 | | total_timesteps | 120960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.2 | | time/ | | | episodes | 508 | | fps | 143 | | time_elapsed | 854 | | total_timesteps | 122880 | | train/ | | | actor_loss | 7.6 | | critic_loss | 0.0597 | | ent_coef | 0.0492 | | ent_coef_loss | 0.14 | | learning_rate | 0.0003 | | n_updates | 15347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.2 | | time/ | | | episodes | 512 | | fps | 143 | | time_elapsed | 854 | | total_timesteps | 122880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.06 | | time/ | | | episodes | 516 | | fps | 143 | | time_elapsed | 867 | | total_timesteps | 124800 | | train/ | | | actor_loss | 7.38 | | critic_loss | 0.00402 | | ent_coef | 0.0501 | | ent_coef_loss | -0.134 | | learning_rate | 0.0003 | | n_updates | 15587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -7.06 | | time/ | | | episodes | 520 | | fps | 143 | | time_elapsed | 867 | | total_timesteps | 124800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.91 | | time/ | | | episodes | 524 | | fps | 144 | | time_elapsed | 879 | | total_timesteps | 126720 | | train/ | | | actor_loss | 7.49 | | critic_loss | 0.0059 | | ent_coef | 0.0488 | | ent_coef_loss | -0.118 | | learning_rate | 0.0003 | | n_updates | 15827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.91 | | time/ | | | episodes | 528 | | fps | 144 | | time_elapsed | 879 | | total_timesteps | 126720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.75 | | time/ | | | episodes | 532 | | fps | 144 | | time_elapsed | 892 | | total_timesteps | 128640 | | train/ | | | actor_loss | 7.45 | | critic_loss | 0.0508 | | ent_coef | 0.0479 | | ent_coef_loss | -0.327 | | learning_rate | 0.0003 | | n_updates | 16067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.75 | | time/ | | | episodes | 536 | | fps | 144 | | time_elapsed | 892 | | total_timesteps | 128640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.66 | | time/ | | | episodes | 540 | | fps | 144 | | time_elapsed | 906 | | total_timesteps | 130560 | | train/ | | | actor_loss | 7.45 | | critic_loss | 0.0124 | | ent_coef | 0.0477 | | ent_coef_loss | -0.0698 | | learning_rate | 0.0003 | | n_updates | 16307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.66 | | time/ | | | episodes | 544 | | fps | 144 | | time_elapsed | 906 | | total_timesteps | 130560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.57 | | time/ | | | episodes | 548 | | fps | 143 | | time_elapsed | 920 | | total_timesteps | 132480 | | train/ | | | actor_loss | 7.45 | | critic_loss | 0.0293 | | ent_coef | 0.0469 | | ent_coef_loss | 0.838 | | learning_rate | 0.0003 | | n_updates | 16547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.57 | | time/ | | | episodes | 552 | | fps | 143 | | time_elapsed | 920 | | total_timesteps | 132480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.52 | | time/ | | | episodes | 556 | | fps | 143 | | time_elapsed | 934 | | total_timesteps | 134400 | | train/ | | | actor_loss | 7.5 | | critic_loss | 0.00633 | | ent_coef | 0.0472 | | ent_coef_loss | -0.0963 | | learning_rate | 0.0003 | | n_updates | 16787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.52 | | time/ | | | episodes | 560 | | fps | 143 | | time_elapsed | 934 | | total_timesteps | 134400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.43 | | time/ | | | episodes | 564 | | fps | 143 | | time_elapsed | 947 | | total_timesteps | 136320 | | train/ | | | actor_loss | 7.68 | | critic_loss | 0.0236 | | ent_coef | 0.0472 | | ent_coef_loss | 0.00233 | | learning_rate | 0.0003 | | n_updates | 17027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.43 | | time/ | | | episodes | 568 | | fps | 143 | | time_elapsed | 947 | | total_timesteps | 136320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.27 | | time/ | | | episodes | 572 | | fps | 143 | | time_elapsed | 960 | | total_timesteps | 138240 | | train/ | | | actor_loss | 7.68 | | critic_loss | 0.0521 | | ent_coef | 0.0467 | | ent_coef_loss | -0.279 | | learning_rate | 0.0003 | | n_updates | 17267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.27 | | time/ | | | episodes | 576 | | fps | 143 | | time_elapsed | 960 | | total_timesteps | 138240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.25 | | time/ | | | episodes | 580 | | fps | 143 | | time_elapsed | 973 | | total_timesteps | 140160 | | train/ | | | actor_loss | 7.85 | | critic_loss | 0.016 | | ent_coef | 0.0497 | | ent_coef_loss | 1.41 | | learning_rate | 0.0003 | | n_updates | 17507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.25 | | time/ | | | episodes | 584 | | fps | 143 | | time_elapsed | 973 | | total_timesteps | 140160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.24 | | time/ | | | episodes | 588 | | fps | 143 | | time_elapsed | 987 | | total_timesteps | 142080 | | train/ | | | actor_loss | 7.38 | | critic_loss | 0.0135 | | ent_coef | 0.0501 | | ent_coef_loss | -0.193 | | learning_rate | 0.0003 | | n_updates | 17747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.24 | | time/ | | | episodes | 592 | | fps | 143 | | time_elapsed | 987 | | total_timesteps | 142080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.22 | | time/ | | | episodes | 596 | | fps | 143 | | time_elapsed | 1001 | | total_timesteps | 144000 | | train/ | | | actor_loss | 7.53 | | critic_loss | 0.00569 | | ent_coef | 0.0503 | | ent_coef_loss | -0.0861 | | learning_rate | 0.0003 | | n_updates | 17987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.22 | | time/ | | | episodes | 600 | | fps | 143 | | time_elapsed | 1001 | | total_timesteps | 144000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.17 | | time/ | | | episodes | 604 | | fps | 143 | | time_elapsed | 1015 | | total_timesteps | 145920 | | train/ | | | actor_loss | 7.34 | | critic_loss | 0.00152 | | ent_coef | 0.0497 | | ent_coef_loss | 0.172 | | learning_rate | 0.0003 | | n_updates | 18227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.17 | | time/ | | | episodes | 608 | | fps | 143 | | time_elapsed | 1015 | | total_timesteps | 145920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.12 | | time/ | | | episodes | 612 | | fps | 143 | | time_elapsed | 1028 | | total_timesteps | 147840 | | train/ | | | actor_loss | 7.39 | | critic_loss | 0.0157 | | ent_coef | 0.0498 | | ent_coef_loss | 0.0855 | | learning_rate | 0.0003 | | n_updates | 18467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.12 | | time/ | | | episodes | 616 | | fps | 143 | | time_elapsed | 1028 | | total_timesteps | 147840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.1 | | time/ | | | episodes | 620 | | fps | 143 | | time_elapsed | 1040 | | total_timesteps | 149760 | | train/ | | | actor_loss | 7.61 | | critic_loss | 0.0194 | | ent_coef | 0.0501 | | ent_coef_loss | 0.143 | | learning_rate | 0.0003 | | n_updates | 18707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.1 | | time/ | | | episodes | 624 | | fps | 143 | | time_elapsed | 1040 | | total_timesteps | 149760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.08 | | time/ | | | episodes | 628 | | fps | 143 | | time_elapsed | 1054 | | total_timesteps | 151680 | | train/ | | | actor_loss | 7.36 | | critic_loss | 0.00183 | | ent_coef | 0.0494 | | ent_coef_loss | -0.182 | | learning_rate | 0.0003 | | n_updates | 18947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.08 | | time/ | | | episodes | 632 | | fps | 143 | | time_elapsed | 1054 | | total_timesteps | 151680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.05 | | time/ | | | episodes | 636 | | fps | 143 | | time_elapsed | 1068 | | total_timesteps | 153600 | | train/ | | | actor_loss | 7.47 | | critic_loss | 0.015 | | ent_coef | 0.0504 | | ent_coef_loss | 0.339 | | learning_rate | 0.0003 | | n_updates | 19187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.05 | | time/ | | | episodes | 640 | | fps | 143 | | time_elapsed | 1068 | | total_timesteps | 153600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.05 | | time/ | | | episodes | 644 | | fps | 143 | | time_elapsed | 1082 | | total_timesteps | 155520 | | train/ | | | actor_loss | 7.66 | | critic_loss | 0.0346 | | ent_coef | 0.0495 | | ent_coef_loss | 0.308 | | learning_rate | 0.0003 | | n_updates | 19427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.05 | | time/ | | | episodes | 648 | | fps | 143 | | time_elapsed | 1082 | | total_timesteps | 155520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.05 | | time/ | | | episodes | 652 | | fps | 143 | | time_elapsed | 1096 | | total_timesteps | 157440 | | train/ | | | actor_loss | 7.51 | | critic_loss | 0.00487 | | ent_coef | 0.0498 | | ent_coef_loss | -0.124 | | learning_rate | 0.0003 | | n_updates | 19667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.05 | | time/ | | | episodes | 656 | | fps | 143 | | time_elapsed | 1096 | | total_timesteps | 157440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.05 | | time/ | | | episodes | 660 | | fps | 143 | | time_elapsed | 1110 | | total_timesteps | 159360 | | train/ | | | actor_loss | 7.51 | | critic_loss | 0.012 | | ent_coef | 0.0489 | | ent_coef_loss | -0.09 | | learning_rate | 0.0003 | | n_updates | 19907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.05 | | time/ | | | episodes | 664 | | fps | 143 | | time_elapsed | 1110 | | total_timesteps | 159360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.17 | | time/ | | | episodes | 668 | | fps | 143 | | time_elapsed | 1124 | | total_timesteps | 161280 | | train/ | | | actor_loss | 7.45 | | critic_loss | 0.0161 | | ent_coef | 0.0517 | | ent_coef_loss | -0.451 | | learning_rate | 0.0003 | | n_updates | 20147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.17 | | time/ | | | episodes | 672 | | fps | 143 | | time_elapsed | 1124 | | total_timesteps | 161280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.18 | | time/ | | | episodes | 676 | | fps | 143 | | time_elapsed | 1138 | | total_timesteps | 163200 | | train/ | | | actor_loss | 7.7 | | critic_loss | 0.0125 | | ent_coef | 0.0515 | | ent_coef_loss | 0.601 | | learning_rate | 0.0003 | | n_updates | 20387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.18 | | time/ | | | episodes | 680 | | fps | 143 | | time_elapsed | 1138 | | total_timesteps | 163200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.18 | | time/ | | | episodes | 684 | | fps | 143 | | time_elapsed | 1152 | | total_timesteps | 165120 | | train/ | | | actor_loss | 7.44 | | critic_loss | 0.00353 | | ent_coef | 0.0509 | | ent_coef_loss | 0.0839 | | learning_rate | 0.0003 | | n_updates | 20627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.18 | | time/ | | | episodes | 688 | | fps | 143 | | time_elapsed | 1152 | | total_timesteps | 165120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.16 | | time/ | | | episodes | 692 | | fps | 143 | | time_elapsed | 1166 | | total_timesteps | 167040 | | train/ | | | actor_loss | 7.43 | | critic_loss | 0.0102 | | ent_coef | 0.0501 | | ent_coef_loss | 0.228 | | learning_rate | 0.0003 | | n_updates | 20867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.16 | | time/ | | | episodes | 696 | | fps | 143 | | time_elapsed | 1166 | | total_timesteps | 167040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.17 | | time/ | | | episodes | 700 | | fps | 143 | | time_elapsed | 1180 | | total_timesteps | 168960 | | train/ | | | actor_loss | 7.39 | | critic_loss | 0.00579 | | ent_coef | 0.0498 | | ent_coef_loss | -0.215 | | learning_rate | 0.0003 | | n_updates | 21107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.17 | | time/ | | | episodes | 704 | | fps | 143 | | time_elapsed | 1180 | | total_timesteps | 168960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.22 | | time/ | | | episodes | 708 | | fps | 143 | | time_elapsed | 1194 | | total_timesteps | 170880 | | train/ | | | actor_loss | 7.38 | | critic_loss | 0.00426 | | ent_coef | 0.0496 | | ent_coef_loss | -0.322 | | learning_rate | 0.0003 | | n_updates | 21347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.22 | | time/ | | | episodes | 712 | | fps | 143 | | time_elapsed | 1194 | | total_timesteps | 170880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.23 | | time/ | | | episodes | 716 | | fps | 143 | | time_elapsed | 1208 | | total_timesteps | 172800 | | train/ | | | actor_loss | 7.48 | | critic_loss | 0.00267 | | ent_coef | 0.0488 | | ent_coef_loss | 0.0184 | | learning_rate | 0.0003 | | n_updates | 21587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.23 | | time/ | | | episodes | 720 | | fps | 143 | | time_elapsed | 1208 | | total_timesteps | 172800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.23 | | time/ | | | episodes | 724 | | fps | 142 | | time_elapsed | 1222 | | total_timesteps | 174720 | | train/ | | | actor_loss | 7.44 | | critic_loss | 0.00674 | | ent_coef | 0.0478 | | ent_coef_loss | -0.0743 | | learning_rate | 0.0003 | | n_updates | 21827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.23 | | time/ | | | episodes | 728 | | fps | 142 | | time_elapsed | 1222 | | total_timesteps | 174720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.18 | | time/ | | | episodes | 732 | | fps | 142 | | time_elapsed | 1235 | | total_timesteps | 176640 | | train/ | | | actor_loss | 7.46 | | critic_loss | 0.0278 | | ent_coef | 0.047 | | ent_coef_loss | -0.105 | | learning_rate | 0.0003 | | n_updates | 22067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.18 | | time/ | | | episodes | 736 | | fps | 142 | | time_elapsed | 1235 | | total_timesteps | 176640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.2 | | time/ | | | episodes | 740 | | fps | 142 | | time_elapsed | 1249 | | total_timesteps | 178560 | | train/ | | | actor_loss | 7.58 | | critic_loss | 0.0165 | | ent_coef | 0.0494 | | ent_coef_loss | 0.0173 | | learning_rate | 0.0003 | | n_updates | 22307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.2 | | time/ | | | episodes | 744 | | fps | 142 | | time_elapsed | 1249 | | total_timesteps | 178560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.24 | | time/ | | | episodes | 748 | | fps | 142 | | time_elapsed | 1263 | | total_timesteps | 180480 | | train/ | | | actor_loss | 7.58 | | critic_loss | 0.0176 | | ent_coef | 0.0508 | | ent_coef_loss | -0.13 | | learning_rate | 0.0003 | | n_updates | 22547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.24 | | time/ | | | episodes | 752 | | fps | 142 | | time_elapsed | 1263 | | total_timesteps | 180480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.27 | | time/ | | | episodes | 756 | | fps | 142 | | time_elapsed | 1277 | | total_timesteps | 182400 | | train/ | | | actor_loss | 7.54 | | critic_loss | 0.00306 | | ent_coef | 0.0498 | | ent_coef_loss | -0.565 | | learning_rate | 0.0003 | | n_updates | 22787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.27 | | time/ | | | episodes | 760 | | fps | 142 | | time_elapsed | 1277 | | total_timesteps | 182400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.31 | | time/ | | | episodes | 764 | | fps | 142 | | time_elapsed | 1291 | | total_timesteps | 184320 | | train/ | | | actor_loss | 7.36 | | critic_loss | 0.00267 | | ent_coef | 0.0498 | | ent_coef_loss | -0.253 | | learning_rate | 0.0003 | | n_updates | 23027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.31 | | time/ | | | episodes | 768 | | fps | 142 | | time_elapsed | 1291 | | total_timesteps | 184320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.31 | | time/ | | | episodes | 772 | | fps | 142 | | time_elapsed | 1305 | | total_timesteps | 186240 | | train/ | | | actor_loss | 7.42 | | critic_loss | 0.0029 | | ent_coef | 0.0489 | | ent_coef_loss | -0.0549 | | learning_rate | 0.0003 | | n_updates | 23267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.31 | | time/ | | | episodes | 776 | | fps | 142 | | time_elapsed | 1305 | | total_timesteps | 186240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.28 | | time/ | | | episodes | 780 | | fps | 142 | | time_elapsed | 1318 | | total_timesteps | 188160 | | train/ | | | actor_loss | 7.58 | | critic_loss | 0.0527 | | ent_coef | 0.0486 | | ent_coef_loss | -0.449 | | learning_rate | 0.0003 | | n_updates | 23507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.28 | | time/ | | | episodes | 784 | | fps | 142 | | time_elapsed | 1318 | | total_timesteps | 188160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.25 | | time/ | | | episodes | 788 | | fps | 142 | | time_elapsed | 1332 | | total_timesteps | 190080 | | train/ | | | actor_loss | 7.58 | | critic_loss | 0.0053 | | ent_coef | 0.0479 | | ent_coef_loss | -0.279 | | learning_rate | 0.0003 | | n_updates | 23747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.25 | | time/ | | | episodes | 792 | | fps | 142 | | time_elapsed | 1332 | | total_timesteps | 190080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.25 | | time/ | | | episodes | 796 | | fps | 142 | | time_elapsed | 1345 | | total_timesteps | 192000 | | train/ | | | actor_loss | 7.47 | | critic_loss | 0.00191 | | ent_coef | 0.0499 | | ent_coef_loss | 0.423 | | learning_rate | 0.0003 | | n_updates | 23987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.25 | | time/ | | | episodes | 800 | | fps | 142 | | time_elapsed | 1345 | | total_timesteps | 192000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.26 | | time/ | | | episodes | 804 | | fps | 142 | | time_elapsed | 1358 | | total_timesteps | 193920 | | train/ | | | actor_loss | 7.42 | | critic_loss | 0.00416 | | ent_coef | 0.0493 | | ent_coef_loss | -0.283 | | learning_rate | 0.0003 | | n_updates | 24227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.26 | | time/ | | | episodes | 808 | | fps | 142 | | time_elapsed | 1358 | | total_timesteps | 193920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.3 | | time/ | | | episodes | 812 | | fps | 142 | | time_elapsed | 1372 | | total_timesteps | 195840 | | train/ | | | actor_loss | 7.58 | | critic_loss | 0.0139 | | ent_coef | 0.0481 | | ent_coef_loss | -0.0726 | | learning_rate | 0.0003 | | n_updates | 24467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.3 | | time/ | | | episodes | 816 | | fps | 142 | | time_elapsed | 1372 | | total_timesteps | 195840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.29 | | time/ | | | episodes | 820 | | fps | 142 | | time_elapsed | 1385 | | total_timesteps | 197760 | | train/ | | | actor_loss | 7.49 | | critic_loss | 0.00338 | | ent_coef | 0.0494 | | ent_coef_loss | -0.1 | | learning_rate | 0.0003 | | n_updates | 24707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.29 | | time/ | | | episodes | 824 | | fps | 142 | | time_elapsed | 1385 | | total_timesteps | 197760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.3 | | time/ | | | episodes | 828 | | fps | 142 | | time_elapsed | 1399 | | total_timesteps | 199680 | | train/ | | | actor_loss | 7.4 | | critic_loss | 0.00417 | | ent_coef | 0.0484 | | ent_coef_loss | -0.0342 | | learning_rate | 0.0003 | | n_updates | 24947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.3 | | time/ | | | episodes | 832 | | fps | 142 | | time_elapsed | 1399 | | total_timesteps | 199680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.3 | | time/ | | | episodes | 836 | | fps | 142 | | time_elapsed | 1413 | | total_timesteps | 201600 | | train/ | | | actor_loss | 7.43 | | critic_loss | 0.0014 | | ent_coef | 0.0473 | | ent_coef_loss | -0.293 | | learning_rate | 0.0003 | | n_updates | 25187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.3 | | time/ | | | episodes | 840 | | fps | 142 | | time_elapsed | 1413 | | total_timesteps | 201600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.28 | | time/ | | | episodes | 844 | | fps | 142 | | time_elapsed | 1427 | | total_timesteps | 203520 | | train/ | | | actor_loss | 7.39 | | critic_loss | 0.000722 | | ent_coef | 0.0462 | | ent_coef_loss | -0.145 | | learning_rate | 0.0003 | | n_updates | 25427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.28 | | time/ | | | episodes | 848 | | fps | 142 | | time_elapsed | 1427 | | total_timesteps | 203520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.25 | | time/ | | | episodes | 852 | | fps | 142 | | time_elapsed | 1441 | | total_timesteps | 205440 | | train/ | | | actor_loss | 7.38 | | critic_loss | 0.000572 | | ent_coef | 0.0454 | | ent_coef_loss | -0.0949 | | learning_rate | 0.0003 | | n_updates | 25667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.25 | | time/ | | | episodes | 856 | | fps | 142 | | time_elapsed | 1441 | | total_timesteps | 205440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.15 | | time/ | | | episodes | 860 | | fps | 142 | | time_elapsed | 1455 | | total_timesteps | 207360 | | train/ | | | actor_loss | 7.37 | | critic_loss | 0.000305 | | ent_coef | 0.0448 | | ent_coef_loss | 0.109 | | learning_rate | 0.0003 | | n_updates | 25907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.15 | | time/ | | | episodes | 864 | | fps | 142 | | time_elapsed | 1455 | | total_timesteps | 207360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.08 | | time/ | | | episodes | 868 | | fps | 142 | | time_elapsed | 1470 | | total_timesteps | 209280 | | train/ | | | actor_loss | 7.36 | | critic_loss | 0.000282 | | ent_coef | 0.0443 | | ent_coef_loss | 0.00955 | | learning_rate | 0.0003 | | n_updates | 26147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -6.08 | | time/ | | | episodes | 872 | | fps | 142 | | time_elapsed | 1470 | | total_timesteps | 209280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.98 | | time/ | | | episodes | 876 | | fps | 142 | | time_elapsed | 1484 | | total_timesteps | 211200 | | train/ | | | actor_loss | 7.35 | | critic_loss | 0.000268 | | ent_coef | 0.0438 | | ent_coef_loss | -0.114 | | learning_rate | 0.0003 | | n_updates | 26387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.98 | | time/ | | | episodes | 880 | | fps | 142 | | time_elapsed | 1484 | | total_timesteps | 211200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.91 | | time/ | | | episodes | 884 | | fps | 142 | | time_elapsed | 1497 | | total_timesteps | 213120 | | train/ | | | actor_loss | 7.35 | | critic_loss | 0.000326 | | ent_coef | 0.0435 | | ent_coef_loss | -0.033 | | learning_rate | 0.0003 | | n_updates | 26627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.91 | | time/ | | | episodes | 888 | | fps | 142 | | time_elapsed | 1497 | | total_timesteps | 213120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.85 | | time/ | | | episodes | 892 | | fps | 142 | | time_elapsed | 1511 | | total_timesteps | 215040 | | train/ | | | actor_loss | 7.35 | | critic_loss | 0.000296 | | ent_coef | 0.0433 | | ent_coef_loss | 0.0545 | | learning_rate | 0.0003 | | n_updates | 26867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.85 | | time/ | | | episodes | 896 | | fps | 142 | | time_elapsed | 1511 | | total_timesteps | 215040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.78 | | time/ | | | episodes | 900 | | fps | 142 | | time_elapsed | 1525 | | total_timesteps | 216960 | | train/ | | | actor_loss | 7.35 | | critic_loss | 0.000291 | | ent_coef | 0.043 | | ent_coef_loss | 0.000298 | | learning_rate | 0.0003 | | n_updates | 27107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.78 | | time/ | | | episodes | 904 | | fps | 142 | | time_elapsed | 1525 | | total_timesteps | 216960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.71 | | time/ | | | episodes | 908 | | fps | 142 | | time_elapsed | 1539 | | total_timesteps | 218880 | | train/ | | | actor_loss | 7.33 | | critic_loss | 0.00028 | | ent_coef | 0.0429 | | ent_coef_loss | -0.0205 | | learning_rate | 0.0003 | | n_updates | 27347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.71 | | time/ | | | episodes | 912 | | fps | 142 | | time_elapsed | 1539 | | total_timesteps | 218880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.65 | | time/ | | | episodes | 916 | | fps | 142 | | time_elapsed | 1553 | | total_timesteps | 220800 | | train/ | | | actor_loss | 7.33 | | critic_loss | 0.000333 | | ent_coef | 0.0428 | | ent_coef_loss | 0.368 | | learning_rate | 0.0003 | | n_updates | 27587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.65 | | time/ | | | episodes | 920 | | fps | 142 | | time_elapsed | 1553 | | total_timesteps | 220800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.62 | | time/ | | | episodes | 924 | | fps | 142 | | time_elapsed | 1566 | | total_timesteps | 222720 | | train/ | | | actor_loss | 7.32 | | critic_loss | 0.000254 | | ent_coef | 0.0427 | | ent_coef_loss | 0.141 | | learning_rate | 0.0003 | | n_updates | 27827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.62 | | time/ | | | episodes | 928 | | fps | 142 | | time_elapsed | 1566 | | total_timesteps | 222720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.59 | | time/ | | | episodes | 932 | | fps | 142 | | time_elapsed | 1580 | | total_timesteps | 224640 | | train/ | | | actor_loss | 7.31 | | critic_loss | 0.000329 | | ent_coef | 0.0426 | | ent_coef_loss | -0.117 | | learning_rate | 0.0003 | | n_updates | 28067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.59 | | time/ | | | episodes | 936 | | fps | 142 | | time_elapsed | 1580 | | total_timesteps | 224640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.54 | | time/ | | | episodes | 940 | | fps | 142 | | time_elapsed | 1594 | | total_timesteps | 226560 | | train/ | | | actor_loss | 7.31 | | critic_loss | 0.000337 | | ent_coef | 0.0426 | | ent_coef_loss | 0.0404 | | learning_rate | 0.0003 | | n_updates | 28307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.54 | | time/ | | | episodes | 944 | | fps | 142 | | time_elapsed | 1594 | | total_timesteps | 226560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 948 | | fps | 142 | | time_elapsed | 1608 | | total_timesteps | 228480 | | train/ | | | actor_loss | 7.31 | | critic_loss | 0.000299 | | ent_coef | 0.0426 | | ent_coef_loss | 0.135 | | learning_rate | 0.0003 | | n_updates | 28547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 952 | | fps | 142 | | time_elapsed | 1608 | | total_timesteps | 228480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.42 | | time/ | | | episodes | 956 | | fps | 142 | | time_elapsed | 1621 | | total_timesteps | 230400 | | train/ | | | actor_loss | 7.29 | | critic_loss | 0.000316 | | ent_coef | 0.0425 | | ent_coef_loss | -0.018 | | learning_rate | 0.0003 | | n_updates | 28787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.42 | | time/ | | | episodes | 960 | | fps | 142 | | time_elapsed | 1621 | | total_timesteps | 230400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.39 | | time/ | | | episodes | 964 | | fps | 142 | | time_elapsed | 1635 | | total_timesteps | 232320 | | train/ | | | actor_loss | 7.3 | | critic_loss | 0.000327 | | ent_coef | 0.0425 | | ent_coef_loss | 0.0315 | | learning_rate | 0.0003 | | n_updates | 29027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.39 | | time/ | | | episodes | 968 | | fps | 142 | | time_elapsed | 1635 | | total_timesteps | 232320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.4 | | time/ | | | episodes | 972 | | fps | 141 | | time_elapsed | 1650 | | total_timesteps | 234240 | | train/ | | | actor_loss | 7.29 | | critic_loss | 0.000271 | | ent_coef | 0.0425 | | ent_coef_loss | 0.109 | | learning_rate | 0.0003 | | n_updates | 29267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.4 | | time/ | | | episodes | 976 | | fps | 141 | | time_elapsed | 1650 | | total_timesteps | 234240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.43 | | time/ | | | episodes | 980 | | fps | 141 | | time_elapsed | 1664 | | total_timesteps | 236160 | | train/ | | | actor_loss | 7.28 | | critic_loss | 0.000301 | | ent_coef | 0.0425 | | ent_coef_loss | 0.119 | | learning_rate | 0.0003 | | n_updates | 29507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.43 | | time/ | | | episodes | 984 | | fps | 141 | | time_elapsed | 1664 | | total_timesteps | 236160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.41 | | time/ | | | episodes | 988 | | fps | 141 | | time_elapsed | 1679 | | total_timesteps | 238080 | | train/ | | | actor_loss | 7.28 | | critic_loss | 0.000297 | | ent_coef | 0.0423 | | ent_coef_loss | -0.188 | | learning_rate | 0.0003 | | n_updates | 29747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.41 | | time/ | | | episodes | 992 | | fps | 141 | | time_elapsed | 1679 | | total_timesteps | 238080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.43 | | time/ | | | episodes | 996 | | fps | 141 | | time_elapsed | 1693 | | total_timesteps | 240000 | | train/ | | | actor_loss | 7.27 | | critic_loss | 0.000293 | | ent_coef | 0.0423 | | ent_coef_loss | 0.0253 | | learning_rate | 0.0003 | | n_updates | 29987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.43 | | time/ | | | episodes | 1000 | | fps | 141 | | time_elapsed | 1693 | | total_timesteps | 240000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.44 | | time/ | | | episodes | 1004 | | fps | 141 | | time_elapsed | 1707 | | total_timesteps | 241920 | | train/ | | | actor_loss | 7.26 | | critic_loss | 0.000309 | | ent_coef | 0.0423 | | ent_coef_loss | 0.276 | | learning_rate | 0.0003 | | n_updates | 30227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.44 | | time/ | | | episodes | 1008 | | fps | 141 | | time_elapsed | 1707 | | total_timesteps | 241920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1012 | | fps | 141 | | time_elapsed | 1720 | | total_timesteps | 243840 | | train/ | | | actor_loss | 7.26 | | critic_loss | 0.000458 | | ent_coef | 0.0423 | | ent_coef_loss | 0.0424 | | learning_rate | 0.0003 | | n_updates | 30467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1016 | | fps | 141 | | time_elapsed | 1720 | | total_timesteps | 243840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1020 | | fps | 141 | | time_elapsed | 1734 | | total_timesteps | 245760 | | train/ | | | actor_loss | 7.24 | | critic_loss | 0.000283 | | ent_coef | 0.0424 | | ent_coef_loss | -0.207 | | learning_rate | 0.0003 | | n_updates | 30707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1024 | | fps | 141 | | time_elapsed | 1734 | | total_timesteps | 245760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.44 | | time/ | | | episodes | 1028 | | fps | 141 | | time_elapsed | 1748 | | total_timesteps | 247680 | | train/ | | | actor_loss | 7.23 | | critic_loss | 0.000302 | | ent_coef | 0.0423 | | ent_coef_loss | -0.186 | | learning_rate | 0.0003 | | n_updates | 30947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.44 | | time/ | | | episodes | 1032 | | fps | 141 | | time_elapsed | 1748 | | total_timesteps | 247680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1036 | | fps | 141 | | time_elapsed | 1761 | | total_timesteps | 249600 | | train/ | | | actor_loss | 7.23 | | critic_loss | 0.000235 | | ent_coef | 0.0422 | | ent_coef_loss | -0.0115 | | learning_rate | 0.0003 | | n_updates | 31187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1040 | | fps | 141 | | time_elapsed | 1761 | | total_timesteps | 249600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1044 | | fps | 141 | | time_elapsed | 1776 | | total_timesteps | 251520 | | train/ | | | actor_loss | 7.2 | | critic_loss | 0.000292 | | ent_coef | 0.0422 | | ent_coef_loss | -0.165 | | learning_rate | 0.0003 | | n_updates | 31427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1048 | | fps | 141 | | time_elapsed | 1776 | | total_timesteps | 251520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1052 | | fps | 141 | | time_elapsed | 1790 | | total_timesteps | 253440 | | train/ | | | actor_loss | 7.22 | | critic_loss | 0.000259 | | ent_coef | 0.042 | | ent_coef_loss | 0.432 | | learning_rate | 0.0003 | | n_updates | 31667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1056 | | fps | 141 | | time_elapsed | 1790 | | total_timesteps | 253440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1060 | | fps | 141 | | time_elapsed | 1804 | | total_timesteps | 255360 | | train/ | | | actor_loss | 7.21 | | critic_loss | 0.000319 | | ent_coef | 0.0422 | | ent_coef_loss | 0.112 | | learning_rate | 0.0003 | | n_updates | 31907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1064 | | fps | 141 | | time_elapsed | 1804 | | total_timesteps | 255360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1068 | | fps | 141 | | time_elapsed | 1818 | | total_timesteps | 257280 | | train/ | | | actor_loss | 7.21 | | critic_loss | 0.000317 | | ent_coef | 0.0422 | | ent_coef_loss | -0.0528 | | learning_rate | 0.0003 | | n_updates | 32147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1072 | | fps | 141 | | time_elapsed | 1818 | | total_timesteps | 257280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1076 | | fps | 141 | | time_elapsed | 1832 | | total_timesteps | 259200 | | train/ | | | actor_loss | 7.2 | | critic_loss | 0.000367 | | ent_coef | 0.0421 | | ent_coef_loss | -0.0196 | | learning_rate | 0.0003 | | n_updates | 32387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1080 | | fps | 141 | | time_elapsed | 1832 | | total_timesteps | 259200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1084 | | fps | 141 | | time_elapsed | 1846 | | total_timesteps | 261120 | | train/ | | | actor_loss | 7.19 | | critic_loss | 0.000291 | | ent_coef | 0.042 | | ent_coef_loss | 0.0255 | | learning_rate | 0.0003 | | n_updates | 32627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1088 | | fps | 141 | | time_elapsed | 1846 | | total_timesteps | 261120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.44 | | time/ | | | episodes | 1092 | | fps | 141 | | time_elapsed | 1860 | | total_timesteps | 263040 | | train/ | | | actor_loss | 7.17 | | critic_loss | 0.000566 | | ent_coef | 0.0421 | | ent_coef_loss | -0.0782 | | learning_rate | 0.0003 | | n_updates | 32867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.44 | | time/ | | | episodes | 1096 | | fps | 141 | | time_elapsed | 1860 | | total_timesteps | 263040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.42 | | time/ | | | episodes | 1100 | | fps | 141 | | time_elapsed | 1874 | | total_timesteps | 264960 | | train/ | | | actor_loss | 7.17 | | critic_loss | 0.000302 | | ent_coef | 0.042 | | ent_coef_loss | -0.136 | | learning_rate | 0.0003 | | n_updates | 33107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.42 | | time/ | | | episodes | 1104 | | fps | 141 | | time_elapsed | 1874 | | total_timesteps | 264960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.42 | | time/ | | | episodes | 1108 | | fps | 141 | | time_elapsed | 1889 | | total_timesteps | 266880 | | train/ | | | actor_loss | 7.16 | | critic_loss | 0.000271 | | ent_coef | 0.042 | | ent_coef_loss | 0.015 | | learning_rate | 0.0003 | | n_updates | 33347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.42 | | time/ | | | episodes | 1112 | | fps | 141 | | time_elapsed | 1889 | | total_timesteps | 266880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.4 | | time/ | | | episodes | 1116 | | fps | 141 | | time_elapsed | 1903 | | total_timesteps | 268800 | | train/ | | | actor_loss | 7.16 | | critic_loss | 0.000378 | | ent_coef | 0.042 | | ent_coef_loss | -0.425 | | learning_rate | 0.0003 | | n_updates | 33587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.4 | | time/ | | | episodes | 1120 | | fps | 141 | | time_elapsed | 1903 | | total_timesteps | 268800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.37 | | time/ | | | episodes | 1124 | | fps | 141 | | time_elapsed | 1916 | | total_timesteps | 270720 | | train/ | | | actor_loss | 7.17 | | critic_loss | 0.000309 | | ent_coef | 0.0419 | | ent_coef_loss | -0.0434 | | learning_rate | 0.0003 | | n_updates | 33827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.37 | | time/ | | | episodes | 1128 | | fps | 141 | | time_elapsed | 1916 | | total_timesteps | 270720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.39 | | time/ | | | episodes | 1132 | | fps | 141 | | time_elapsed | 1930 | | total_timesteps | 272640 | | train/ | | | actor_loss | 7.15 | | critic_loss | 0.000247 | | ent_coef | 0.0419 | | ent_coef_loss | -0.124 | | learning_rate | 0.0003 | | n_updates | 34067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.39 | | time/ | | | episodes | 1136 | | fps | 141 | | time_elapsed | 1930 | | total_timesteps | 272640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.4 | | time/ | | | episodes | 1140 | | fps | 141 | | time_elapsed | 1944 | | total_timesteps | 274560 | | train/ | | | actor_loss | 7.14 | | critic_loss | 0.00029 | | ent_coef | 0.042 | | ent_coef_loss | -0.0588 | | learning_rate | 0.0003 | | n_updates | 34307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.4 | | time/ | | | episodes | 1144 | | fps | 141 | | time_elapsed | 1944 | | total_timesteps | 274560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.39 | | time/ | | | episodes | 1148 | | fps | 141 | | time_elapsed | 1957 | | total_timesteps | 276480 | | train/ | | | actor_loss | 7.13 | | critic_loss | 0.000277 | | ent_coef | 0.0419 | | ent_coef_loss | 0.0706 | | learning_rate | 0.0003 | | n_updates | 34547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.39 | | time/ | | | episodes | 1152 | | fps | 141 | | time_elapsed | 1957 | | total_timesteps | 276480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.4 | | time/ | | | episodes | 1156 | | fps | 141 | | time_elapsed | 1970 | | total_timesteps | 278400 | | train/ | | | actor_loss | 7.13 | | critic_loss | 0.000232 | | ent_coef | 0.0419 | | ent_coef_loss | -0.0838 | | learning_rate | 0.0003 | | n_updates | 34787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.4 | | time/ | | | episodes | 1160 | | fps | 141 | | time_elapsed | 1970 | | total_timesteps | 278400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.42 | | time/ | | | episodes | 1164 | | fps | 141 | | time_elapsed | 1983 | | total_timesteps | 280320 | | train/ | | | actor_loss | 7.14 | | critic_loss | 0.000388 | | ent_coef | 0.0418 | | ent_coef_loss | 0.11 | | learning_rate | 0.0003 | | n_updates | 35027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.42 | | time/ | | | episodes | 1168 | | fps | 141 | | time_elapsed | 1983 | | total_timesteps | 280320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1172 | | fps | 141 | | time_elapsed | 1997 | | total_timesteps | 282240 | | train/ | | | actor_loss | 7.13 | | critic_loss | 0.00106 | | ent_coef | 0.0417 | | ent_coef_loss | 0.0236 | | learning_rate | 0.0003 | | n_updates | 35267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1176 | | fps | 141 | | time_elapsed | 1997 | | total_timesteps | 282240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1180 | | fps | 141 | | time_elapsed | 2011 | | total_timesteps | 284160 | | train/ | | | actor_loss | 7.11 | | critic_loss | 0.000352 | | ent_coef | 0.0418 | | ent_coef_loss | 0.176 | | learning_rate | 0.0003 | | n_updates | 35507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1184 | | fps | 141 | | time_elapsed | 2011 | | total_timesteps | 284160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1188 | | fps | 141 | | time_elapsed | 2024 | | total_timesteps | 286080 | | train/ | | | actor_loss | 7.1 | | critic_loss | 0.000222 | | ent_coef | 0.0418 | | ent_coef_loss | -0.113 | | learning_rate | 0.0003 | | n_updates | 35747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1192 | | fps | 141 | | time_elapsed | 2024 | | total_timesteps | 286080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1196 | | fps | 141 | | time_elapsed | 2038 | | total_timesteps | 288000 | | train/ | | | actor_loss | 7.09 | | critic_loss | 0.000216 | | ent_coef | 0.0417 | | ent_coef_loss | 0.0759 | | learning_rate | 0.0003 | | n_updates | 35987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1200 | | fps | 141 | | time_elapsed | 2038 | | total_timesteps | 288000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.44 | | time/ | | | episodes | 1204 | | fps | 141 | | time_elapsed | 2054 | | total_timesteps | 289920 | | train/ | | | actor_loss | 7.09 | | critic_loss | 0.000308 | | ent_coef | 0.0417 | | ent_coef_loss | -0.172 | | learning_rate | 0.0003 | | n_updates | 36227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.44 | | time/ | | | episodes | 1208 | | fps | 141 | | time_elapsed | 2054 | | total_timesteps | 289920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1212 | | fps | 141 | | time_elapsed | 2068 | | total_timesteps | 291840 | | train/ | | | actor_loss | 7.08 | | critic_loss | 0.000277 | | ent_coef | 0.0417 | | ent_coef_loss | -0.0299 | | learning_rate | 0.0003 | | n_updates | 36467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.45 | | time/ | | | episodes | 1216 | | fps | 141 | | time_elapsed | 2068 | | total_timesteps | 291840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1220 | | fps | 141 | | time_elapsed | 2082 | | total_timesteps | 293760 | | train/ | | | actor_loss | 7.09 | | critic_loss | 0.00029 | | ent_coef | 0.0417 | | ent_coef_loss | -0.0631 | | learning_rate | 0.0003 | | n_updates | 36707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1224 | | fps | 141 | | time_elapsed | 2082 | | total_timesteps | 293760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.49 | | time/ | | | episodes | 1228 | | fps | 140 | | time_elapsed | 2097 | | total_timesteps | 295680 | | train/ | | | actor_loss | 7.07 | | critic_loss | 0.000261 | | ent_coef | 0.0417 | | ent_coef_loss | -0.122 | | learning_rate | 0.0003 | | n_updates | 36947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.49 | | time/ | | | episodes | 1232 | | fps | 140 | | time_elapsed | 2097 | | total_timesteps | 295680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1236 | | fps | 140 | | time_elapsed | 2111 | | total_timesteps | 297600 | | train/ | | | actor_loss | 7.07 | | critic_loss | 0.000278 | | ent_coef | 0.0416 | | ent_coef_loss | 0.141 | | learning_rate | 0.0003 | | n_updates | 37187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.47 | | time/ | | | episodes | 1240 | | fps | 140 | | time_elapsed | 2111 | | total_timesteps | 297600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1244 | | fps | 140 | | time_elapsed | 2125 | | total_timesteps | 299520 | | train/ | | | actor_loss | 7.06 | | critic_loss | 0.00029 | | ent_coef | 0.0416 | | ent_coef_loss | -0.224 | | learning_rate | 0.0003 | | n_updates | 37427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -5.46 | | time/ | | | episodes | 1248 | | fps | 140 | | time_elapsed | 2125 | | total_timesteps | 299520 | ---------------------------------
def nii_path_from_X_and_h(cm, X_path, h_path, A_notional=100.0, L_notional=100.0,
tau_A=3.0, tau_L=1.0, dt=1/12):
T = X_path.shape[0]
NII = np.zeros(T-1)
for t in range(T-1):
X_t, X_next = X_path[t], X_path[t+1]
yA = float(afns_yields_from_factors(X_t, np.array([tau_A]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])
yL = float(afns_yields_from_factors(X_t, np.array([tau_L]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])
# forward
y1 = float(afns_yields_from_factors(X_t, np.array([tau_L]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])
y2 = float(afns_yields_from_factors(X_t, np.array([tau_L + dt]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])
K_t = ((tau_L + dt) * y2 - tau_L * y1) / dt
y_float_next = float(afns_yields_from_factors(X_next, np.array([tau_L]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])
NII[t] = (A_notional*yA*dt - L_notional*yL*dt + h_path[t]*(K_t - y_float_next)*dt)
return NII
def rollout_policy_on_exogenous_X(cm, X_path, policy,
K_lq=None, rl_model=None,
X_ref=None, center_state=True,
action_max=1.0,
A_notional=100.0, L_notional=100.0,
tau_A=3.0, tau_L=1.0, dt=1/12):
"""
policy: "unhedged" | "lq" | "rl"
Returns dict with h, u, NII.
"""
T = X_path.shape[0]
h = np.zeros(T)
u = np.zeros(T-1)
for t in range(T-1):
if policy == "unhedged":
u_t = 0.0
elif policy == "lq":
if K_lq is None:
raise ValueError("Need K_lq for LQ.")
x_t = np.array([X_path[t,0], X_path[t,1], X_path[t,2], h[t]], dtype=float)
u_t = -float(K_lq @ x_t)
u_t = float(np.clip(u_t, -action_max, action_max))
elif policy == "rl":
if rl_model is None:
raise ValueError("Need rl_model for RL.")
obs = np.array([X_path[t,0], X_path[t,1], X_path[t,2], h[t]], dtype=np.float32)
if center_state and (X_ref is not None):
obs[:3] -= X_ref.astype(np.float32)
act, _ = rl_model.predict(obs, deterministic=True)
u_t = float(np.clip(float(act[0]), -action_max, action_max))
else:
raise ValueError("Unknown policy.")
u[t] = u_t
h[t+1] = h[t] + u_t
NII = nii_path_from_X_and_h(cm, X_path, h, A_notional, L_notional, tau_A, tau_L, dt)
return {"h": h, "u": u, "NII": NII}
def summarize(N):
return {
"std": float(np.std(N)),
"p05": float(np.quantile(N, 0.05)),
"min": float(np.min(N)),
"mean": float(np.mean(N)),
}
def compare_on_scenario(name, X_path, K_lq, rl_model):
res0 = rollout_policy_on_exogenous_X(cm, X_path, "unhedged", action_max=1.0, X_ref=X_ref)
resL = rollout_policy_on_exogenous_X(cm, X_path, "lq", K_lq=K_lq, action_max=1.0, X_ref=X_ref)
resR = rollout_policy_on_exogenous_X(cm, X_path, "rl", rl_model=rl_model, action_max=1.0, X_ref=X_ref)
row = {
"scenario": name,
"unhedged_std": summarize(res0["NII"])["std"],
"lq_std": summarize(resL["NII"])["std"],
"rl_std": summarize(resR["NII"])["std"],
"unhedged_p05": summarize(res0["NII"])["p05"],
"lq_p05": summarize(resL["NII"])["p05"],
"rl_p05": summarize(resR["NII"])["p05"],
"lq_turnover": float(np.mean(np.abs(resL["u"]))),
"rl_turnover": float(np.mean(np.abs(resR["u"]))),
"lq_inv": float(np.mean(np.abs(resL["h"]))),
"rl_inv": float(np.mean(np.abs(resR["h"]))),
}
# plots
import matplotlib.pyplot as plt
plt.figure(figsize=(9,4))
plt.plot(res0["NII"], label="Unhedged")
plt.plot(resL["NII"], label="LQ")
plt.plot(resR["NII"], label="RL (SAC)")
plt.title(name)
plt.xlabel("t (months)")
plt.ylabel("NII")
plt.legend()
plt.tight_layout()
plt.show()
return row
# Build K_lq once using chosen lambdas and the same build_Hx_QR_from_nii + Riccati pipeline
H_x, Q_s, R = build_Hx_QR_from_nii(
cm, X_ref, A_notional, L_notional, tau_A, tau_L, dt,
alpha_nii=1.0, lambda_h=lambda_h_star, lambda_u=lambda_u_star
)
A_lq, B_lq = build_AB_from_Phi(Phi_hat)
P_lq, K_lq = solve_discrete_riccati(A_lq, B_lq, Q_s, R)
rows = []
rows.append(compare_on_scenario("Stress: +200bp parallel", X_parallel, K_lq, model))
rows.append(compare_on_scenario("Stress: bear steepener", X_steepen, K_lq, model))
rows.append(compare_on_scenario("Stress: high vol x3", X_highvol, K_lq, model))
import pandas as pd
df_compare = pd.DataFrame(rows)
df_compare
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) u_t = -float(K_lq @ x_t)
| scenario | unhedged_std | lq_std | rl_std | unhedged_p05 | lq_p05 | rl_p05 | lq_turnover | rl_turnover | lq_inv | rl_inv | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Stress: +200bp parallel | 0.011522 | 0.010117 | 0.011566 | 0.021739 | 0.020354 | 0.021979 | 0.050177 | 0.005414 | 14.585011 | 1.776519 |
| 1 | Stress: bear steepener | 0.015246 | 0.013545 | 0.015247 | 0.021091 | 0.019772 | 0.021321 | 0.053618 | 0.005418 | 14.777920 | 1.569327 |
| 2 | Stress: high vol x3 | 0.031582 | 0.028799 | 0.031851 | -0.010008 | -0.010316 | -0.010063 | 0.090683 | 0.012188 | 13.971398 | 1.502951 |
compare_on_scenario("Stress: high vol x3", X_highvol, K_lq, model)
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) u_t = -float(K_lq @ x_t)
{'scenario': 'Stress: high vol x3',
'unhedged_std': 0.03158229651939372,
'lq_std': 0.028799311337273092,
'rl_std': 0.031850762210246804,
'unhedged_p05': -0.01000792378107949,
'lq_p05': -0.010316395757328036,
'rl_p05': -0.010062760747148576,
'lq_turnover': 0.09068329468290291,
'rl_turnover': 0.01218796659398962,
'lq_inv': 13.971398221022996,
'rl_inv': 1.5029505758307546}
L1 transaction costs (bid–ask / fees): a realistic non-quadratic objective¶
We replace the quadratic turnover penalty with an L1 cost:
$ \text{Cost}_t = \text{NII}_{t+1}^2 + \lambda_h h_t^2 + \kappa_u |u_t| $
This change is small but economically meaningful:
- trading a little still costs something (no smooth quadratic approximation),
- optimal behavior often includes “no-trade” regions,
- LQ is no longer optimal because the objective is not quadratic.
This is the key reason RL is included: it can learn sparse trading and inventory-aware behavior under realistic frictions.
class IrrbbNiiHedgeEnvL1(gym.Env):
"""
IRRBB NII hedging environment with L1 transaction costs.
State: [L, S, C, h] (AFNS factors + hedge inventory)
Action: u in [-1,1], scaled to Δh via u_scale
Reward:
r_t = - [ NII_{t+1}^2 + lambda_h * h_t^2 + kappa_u * |u_t| ]
This breaks LQ assumptions and favors sparse trading.
"""
metadata = {"render_modes": []}
def __init__(
self,
cm,
c_hat,
Phi_hat,
Sigma_hat,
X_ref,
A_notional=100.0,
L_notional=100.0,
tau_A=3.0,
tau_L=1.0,
dt=1 / 12,
lambda_h=1e-6,
kappa_u=3e-4,
u_scale=25.0,
no_trade_eps=0.0,
episode_len=240,
center_state=True,
seed=123,
):
super().__init__()
self.cm = cm
self.c = np.asarray(c_hat, float)
self.Phi = np.asarray(Phi_hat, float)
self.Sigma = np.asarray(Sigma_hat, float)
self.X_ref = np.asarray(X_ref, float)
self.A_notional = A_notional
self.L_notional = L_notional
self.tau_A = tau_A
self.tau_L = tau_L
self.dt = dt
self.lambda_h = lambda_h
self.kappa_u = kappa_u
self.u_scale = u_scale
self.no_trade_eps = no_trade_eps
self.episode_len = episode_len
self.center_state = center_state
self.rng = np.random.default_rng(seed)
# State: (L, S, C, h)
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(4,), dtype=np.float32
)
# Action: scalar in [-1, 1]
self.action_space = spaces.Box(
low=-1.0, high=1.0, shape=(1,), dtype=np.float32
)
self.t = 0
self.X = None
self.h = None
# ---------- AFNS helpers ----------
def _y(self, X, tau):
return float(
afns_yields_from_factors(
X,
np.array([tau]),
self.cm.lam,
self.cm.sig1,
self.cm.sig2,
self.cm.sig3,
)[0]
)
def _fwd_cc(self, X, tau1, tau2):
y1 = self._y(X, tau1)
y2 = self._y(X, tau2)
return (tau2 * y2 - tau1 * y1) / (tau2 - tau1)
def _nii_step(self, X_t, X_next, h_t):
yA = self._y(X_t, self.tau_A)
yL = self._y(X_t, self.tau_L)
K_t = self._fwd_cc(X_t, self.tau_L, self.tau_L + self.dt)
y_float_next = self._y(X_next, self.tau_L)
return (
self.A_notional * yA * self.dt
- self.L_notional * yL * self.dt
+ h_t * (K_t - y_float_next) * self.dt
)
# ---------- Gym API ----------
def _obs(self):
obs = np.array([self.X[0], self.X[1], self.X[2], self.h], dtype=np.float32)
if self.center_state:
obs[:3] -= self.X_ref.astype(np.float32)
return obs
def reset(self, *, seed=None, options=None):
if seed is not None:
self.rng = np.random.default_rng(seed)
self.t = 0
self.X = self.X_ref + self.rng.multivariate_normal(
np.zeros(3), 0.1 * self.Sigma
)
self.h = 0.0
return self._obs(), {}
def step(self, action):
u_raw = float(np.clip(action[0], -1.0, 1.0))
u = u_raw * self.u_scale
if abs(u) < self.no_trade_eps:
u = 0.0
# Next factors
eps = self.rng.multivariate_normal(np.zeros(3), self.Sigma)
X_next = self.c + self.Phi @ self.X + eps
nii = self._nii_step(self.X, X_next, self.h)
# Update hedge
h_next = self.h + u
# L1 reward
reward = -(
nii**2 + self.lambda_h * (self.h**2) + self.kappa_u * abs(u)
)
self.X = X_next
self.h = h_next
self.t += 1
terminated = False
truncated = self.t >= self.episode_len
info = {"nii": nii, "h": self.h, "u": u}
return self._obs(), float(reward), terminated, truncated, info
def train_sac_l1(
cm,
X_smooth,
c_hat,
Phi_hat,
Sigma_hat,
X_ref,
lambda_h,
kappa_u,
total_timesteps=800_000,
n_envs=8,
u_scale=25.0,
no_trade_eps=0.0,
seed=123,
):
def make_env():
return IrrbbNiiHedgeEnvL1(
cm=cm,
c_hat=c_hat,
Phi_hat=Phi_hat,
Sigma_hat=Sigma_hat,
X_ref=X_ref,
lambda_h=lambda_h,
kappa_u=kappa_u,
u_scale=u_scale,
no_trade_eps=no_trade_eps,
seed=seed,
)
vec_env = make_vec_env(make_env, n_envs=n_envs)
model = SAC(
"MlpPolicy",
vec_env,
learning_rate=3e-4,
batch_size=256,
buffer_size=300_000,
gamma=0.99,
tau=0.005,
train_freq=1,
gradient_steps=1,
verbose=1,
)
model.learn(total_timesteps=total_timesteps)
return model
def eval_L1_cost(NII, h, u, lambda_h, kappa_u):
"""
Average L1-based economic cost.
"""
return float(
np.mean(NII**2 + lambda_h * (h[:-1] ** 2) + kappa_u * np.abs(u))
)
lambda_h_rl = lambda_h_star # keep inventory discipline
kappa_u_rl = 3e-4 # L1 trading cost
u_scale = 25.0 # hedge impact scale
model_l1 = train_sac_l1(
cm=cm,
X_smooth=X_smooth,
c_hat=c_hat,
Phi_hat=Phi_hat,
Sigma_hat=Sigma_hat,
X_ref=X_ref,
lambda_h=lambda_h_rl,
kappa_u=kappa_u_rl,
total_timesteps=800_000, # increase to 1–2M if needed
n_envs=8,
u_scale=u_scale,
no_trade_eps=0.0 # later try 0.5 or 1.0
)
Using cpu device --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -2.42 | | time/ | | | episodes | 4 | | fps | 148 | | time_elapsed | 12 | | total_timesteps | 1920 | | train/ | | | actor_loss | 0.77 | | critic_loss | 0.272 | | ent_coef | 0.934 | | ent_coef_loss | -0.105 | | learning_rate | 0.0003 | | n_updates | 227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -2.42 | | time/ | | | episodes | 8 | | fps | 148 | | time_elapsed | 12 | | total_timesteps | 1920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -2.07 | | time/ | | | episodes | 12 | | fps | 144 | | time_elapsed | 26 | | total_timesteps | 3840 | | train/ | | | actor_loss | -0.389 | | critic_loss | 0.0759 | | ent_coef | 0.869 | | ent_coef_loss | -0.234 | | learning_rate | 0.0003 | | n_updates | 467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -2.07 | | time/ | | | episodes | 16 | | fps | 144 | | time_elapsed | 26 | | total_timesteps | 3840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.93 | | time/ | | | episodes | 20 | | fps | 143 | | time_elapsed | 40 | | total_timesteps | 5760 | | train/ | | | actor_loss | -0.938 | | critic_loss | 0.0514 | | ent_coef | 0.809 | | ent_coef_loss | -0.325 | | learning_rate | 0.0003 | | n_updates | 707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.93 | | time/ | | | episodes | 24 | | fps | 143 | | time_elapsed | 40 | | total_timesteps | 5760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.82 | | time/ | | | episodes | 28 | | fps | 143 | | time_elapsed | 53 | | total_timesteps | 7680 | | train/ | | | actor_loss | -1.49 | | critic_loss | 0.0354 | | ent_coef | 0.754 | | ent_coef_loss | -0.46 | | learning_rate | 0.0003 | | n_updates | 947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.82 | | time/ | | | episodes | 32 | | fps | 143 | | time_elapsed | 53 | | total_timesteps | 7680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.77 | | time/ | | | episodes | 36 | | fps | 144 | | time_elapsed | 66 | | total_timesteps | 9600 | | train/ | | | actor_loss | -1.92 | | critic_loss | 0.0337 | | ent_coef | 0.701 | | ent_coef_loss | -0.576 | | learning_rate | 0.0003 | | n_updates | 1187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.77 | | time/ | | | episodes | 40 | | fps | 144 | | time_elapsed | 66 | | total_timesteps | 9600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 44 | | fps | 143 | | time_elapsed | 80 | | total_timesteps | 11520 | | train/ | | | actor_loss | -2.31 | | critic_loss | 0.0165 | | ent_coef | 0.653 | | ent_coef_loss | -0.677 | | learning_rate | 0.0003 | | n_updates | 1427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 48 | | fps | 143 | | time_elapsed | 80 | | total_timesteps | 11520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 52 | | fps | 143 | | time_elapsed | 93 | | total_timesteps | 13440 | | train/ | | | actor_loss | -2.66 | | critic_loss | 0.0179 | | ent_coef | 0.607 | | ent_coef_loss | -0.811 | | learning_rate | 0.0003 | | n_updates | 1667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 56 | | fps | 143 | | time_elapsed | 93 | | total_timesteps | 13440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 60 | | fps | 143 | | time_elapsed | 107 | | total_timesteps | 15360 | | train/ | | | actor_loss | -2.95 | | critic_loss | 0.0265 | | ent_coef | 0.564 | | ent_coef_loss | -0.935 | | learning_rate | 0.0003 | | n_updates | 1907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 64 | | fps | 143 | | time_elapsed | 107 | | total_timesteps | 15360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.64 | | time/ | | | episodes | 68 | | fps | 142 | | time_elapsed | 121 | | total_timesteps | 17280 | | train/ | | | actor_loss | -3.28 | | critic_loss | 0.0394 | | ent_coef | 0.525 | | ent_coef_loss | -1.03 | | learning_rate | 0.0003 | | n_updates | 2147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.64 | | time/ | | | episodes | 72 | | fps | 142 | | time_elapsed | 121 | | total_timesteps | 17280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.63 | | time/ | | | episodes | 76 | | fps | 142 | | time_elapsed | 134 | | total_timesteps | 19200 | | train/ | | | actor_loss | -3.64 | | critic_loss | 0.0172 | | ent_coef | 0.488 | | ent_coef_loss | -1.18 | | learning_rate | 0.0003 | | n_updates | 2387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.63 | | time/ | | | episodes | 80 | | fps | 142 | | time_elapsed | 134 | | total_timesteps | 19200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 84 | | fps | 142 | | time_elapsed | 147 | | total_timesteps | 21120 | | train/ | | | actor_loss | -3.82 | | critic_loss | 0.0178 | | ent_coef | 0.454 | | ent_coef_loss | -1.28 | | learning_rate | 0.0003 | | n_updates | 2627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 88 | | fps | 142 | | time_elapsed | 147 | | total_timesteps | 21120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.64 | | time/ | | | episodes | 92 | | fps | 142 | | time_elapsed | 161 | | total_timesteps | 23040 | | train/ | | | actor_loss | -4 | | critic_loss | 0.0157 | | ent_coef | 0.422 | | ent_coef_loss | -1.4 | | learning_rate | 0.0003 | | n_updates | 2867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.64 | | time/ | | | episodes | 96 | | fps | 142 | | time_elapsed | 161 | | total_timesteps | 23040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.61 | | time/ | | | episodes | 100 | | fps | 143 | | time_elapsed | 174 | | total_timesteps | 24960 | | train/ | | | actor_loss | -4.31 | | critic_loss | 0.00948 | | ent_coef | 0.393 | | ent_coef_loss | -1.54 | | learning_rate | 0.0003 | | n_updates | 3107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.61 | | time/ | | | episodes | 104 | | fps | 143 | | time_elapsed | 174 | | total_timesteps | 24960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.59 | | time/ | | | episodes | 108 | | fps | 143 | | time_elapsed | 186 | | total_timesteps | 26880 | | train/ | | | actor_loss | -4.39 | | critic_loss | 0.0143 | | ent_coef | 0.365 | | ent_coef_loss | -1.64 | | learning_rate | 0.0003 | | n_updates | 3347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.59 | | time/ | | | episodes | 112 | | fps | 143 | | time_elapsed | 186 | | total_timesteps | 26880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.62 | | time/ | | | episodes | 116 | | fps | 144 | | time_elapsed | 199 | | total_timesteps | 28800 | | train/ | | | actor_loss | -4.64 | | critic_loss | 0.00457 | | ent_coef | 0.339 | | ent_coef_loss | -1.77 | | learning_rate | 0.0003 | | n_updates | 3587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.62 | | time/ | | | episodes | 120 | | fps | 144 | | time_elapsed | 199 | | total_timesteps | 28800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.62 | | time/ | | | episodes | 124 | | fps | 144 | | time_elapsed | 212 | | total_timesteps | 30720 | | train/ | | | actor_loss | -4.72 | | critic_loss | 0.0119 | | ent_coef | 0.316 | | ent_coef_loss | -1.88 | | learning_rate | 0.0003 | | n_updates | 3827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.62 | | time/ | | | episodes | 128 | | fps | 144 | | time_elapsed | 212 | | total_timesteps | 30720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.62 | | time/ | | | episodes | 132 | | fps | 144 | | time_elapsed | 226 | | total_timesteps | 32640 | | train/ | | | actor_loss | -4.9 | | critic_loss | 0.00466 | | ent_coef | 0.294 | | ent_coef_loss | -1.98 | | learning_rate | 0.0003 | | n_updates | 4067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.62 | | time/ | | | episodes | 136 | | fps | 144 | | time_elapsed | 226 | | total_timesteps | 32640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 140 | | fps | 143 | | time_elapsed | 240 | | total_timesteps | 34560 | | train/ | | | actor_loss | -5 | | critic_loss | 0.0024 | | ent_coef | 0.273 | | ent_coef_loss | -2.16 | | learning_rate | 0.0003 | | n_updates | 4307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 144 | | fps | 143 | | time_elapsed | 240 | | total_timesteps | 34560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 148 | | fps | 143 | | time_elapsed | 253 | | total_timesteps | 36480 | | train/ | | | actor_loss | -5.02 | | critic_loss | 0.00781 | | ent_coef | 0.254 | | ent_coef_loss | -2.18 | | learning_rate | 0.0003 | | n_updates | 4547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 152 | | fps | 143 | | time_elapsed | 253 | | total_timesteps | 36480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 156 | | fps | 144 | | time_elapsed | 266 | | total_timesteps | 38400 | | train/ | | | actor_loss | -5.2 | | critic_loss | 0.00368 | | ent_coef | 0.236 | | ent_coef_loss | -2.3 | | learning_rate | 0.0003 | | n_updates | 4787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 160 | | fps | 144 | | time_elapsed | 266 | | total_timesteps | 38400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 164 | | fps | 143 | | time_elapsed | 280 | | total_timesteps | 40320 | | train/ | | | actor_loss | -5.21 | | critic_loss | 0.00507 | | ent_coef | 0.22 | | ent_coef_loss | -2.42 | | learning_rate | 0.0003 | | n_updates | 5027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 168 | | fps | 143 | | time_elapsed | 280 | | total_timesteps | 40320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.73 | | time/ | | | episodes | 172 | | fps | 143 | | time_elapsed | 293 | | total_timesteps | 42240 | | train/ | | | actor_loss | -5.33 | | critic_loss | 0.00442 | | ent_coef | 0.205 | | ent_coef_loss | -2.6 | | learning_rate | 0.0003 | | n_updates | 5267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.73 | | time/ | | | episodes | 176 | | fps | 143 | | time_elapsed | 293 | | total_timesteps | 42240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 180 | | fps | 144 | | time_elapsed | 305 | | total_timesteps | 44160 | | train/ | | | actor_loss | -5.38 | | critic_loss | 0.00173 | | ent_coef | 0.19 | | ent_coef_loss | -2.73 | | learning_rate | 0.0003 | | n_updates | 5507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 184 | | fps | 144 | | time_elapsed | 305 | | total_timesteps | 44160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 188 | | fps | 147 | | time_elapsed | 311 | | total_timesteps | 46080 | | train/ | | | actor_loss | -5.43 | | critic_loss | 0.0016 | | ent_coef | 0.177 | | ent_coef_loss | -2.86 | | learning_rate | 0.0003 | | n_updates | 5747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 192 | | fps | 147 | | time_elapsed | 311 | | total_timesteps | 46080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.73 | | time/ | | | episodes | 196 | | fps | 150 | | time_elapsed | 318 | | total_timesteps | 48000 | | train/ | | | actor_loss | -5.47 | | critic_loss | 0.00188 | | ent_coef | 0.165 | | ent_coef_loss | -2.95 | | learning_rate | 0.0003 | | n_updates | 5987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.73 | | time/ | | | episodes | 200 | | fps | 150 | | time_elapsed | 318 | | total_timesteps | 48000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.73 | | time/ | | | episodes | 204 | | fps | 152 | | time_elapsed | 328 | | total_timesteps | 49920 | | train/ | | | actor_loss | -5.5 | | critic_loss | 0.00158 | | ent_coef | 0.153 | | ent_coef_loss | -3.06 | | learning_rate | 0.0003 | | n_updates | 6227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.73 | | time/ | | | episodes | 208 | | fps | 152 | | time_elapsed | 328 | | total_timesteps | 49920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 212 | | fps | 151 | | time_elapsed | 342 | | total_timesteps | 51840 | | train/ | | | actor_loss | -5.54 | | critic_loss | 0.000549 | | ent_coef | 0.143 | | ent_coef_loss | -3.15 | | learning_rate | 0.0003 | | n_updates | 6467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 216 | | fps | 151 | | time_elapsed | 342 | | total_timesteps | 51840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 220 | | fps | 152 | | time_elapsed | 351 | | total_timesteps | 53760 | | train/ | | | actor_loss | -5.53 | | critic_loss | 0.00162 | | ent_coef | 0.133 | | ent_coef_loss | -3.29 | | learning_rate | 0.0003 | | n_updates | 6707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 224 | | fps | 152 | | time_elapsed | 351 | | total_timesteps | 53760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 228 | | fps | 154 | | time_elapsed | 360 | | total_timesteps | 55680 | | train/ | | | actor_loss | -5.53 | | critic_loss | 0.0054 | | ent_coef | 0.123 | | ent_coef_loss | -3.47 | | learning_rate | 0.0003 | | n_updates | 6947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 232 | | fps | 154 | | time_elapsed | 360 | | total_timesteps | 55680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 236 | | fps | 156 | | time_elapsed | 368 | | total_timesteps | 57600 | | train/ | | | actor_loss | -5.5 | | critic_loss | 0.00273 | | ent_coef | 0.115 | | ent_coef_loss | -3.6 | | learning_rate | 0.0003 | | n_updates | 7187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 240 | | fps | 156 | | time_elapsed | 368 | | total_timesteps | 57600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 244 | | fps | 157 | | time_elapsed | 376 | | total_timesteps | 59520 | | train/ | | | actor_loss | -5.56 | | critic_loss | 0.00132 | | ent_coef | 0.107 | | ent_coef_loss | -3.64 | | learning_rate | 0.0003 | | n_updates | 7427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 248 | | fps | 157 | | time_elapsed | 376 | | total_timesteps | 59520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 252 | | fps | 159 | | time_elapsed | 384 | | total_timesteps | 61440 | | train/ | | | actor_loss | -5.53 | | critic_loss | 0.00157 | | ent_coef | 0.0994 | | ent_coef_loss | -3.77 | | learning_rate | 0.0003 | | n_updates | 7667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 256 | | fps | 159 | | time_elapsed | 384 | | total_timesteps | 61440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 260 | | fps | 162 | | time_elapsed | 391 | | total_timesteps | 63360 | | train/ | | | actor_loss | -5.5 | | critic_loss | 0.00114 | | ent_coef | 0.0925 | | ent_coef_loss | -3.94 | | learning_rate | 0.0003 | | n_updates | 7907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.7 | | time/ | | | episodes | 264 | | fps | 162 | | time_elapsed | 391 | | total_timesteps | 63360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 268 | | fps | 164 | | time_elapsed | 397 | | total_timesteps | 65280 | | train/ | | | actor_loss | -5.47 | | critic_loss | 0.000751 | | ent_coef | 0.0861 | | ent_coef_loss | -4.03 | | learning_rate | 0.0003 | | n_updates | 8147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 272 | | fps | 164 | | time_elapsed | 397 | | total_timesteps | 65280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 276 | | fps | 166 | | time_elapsed | 403 | | total_timesteps | 67200 | | train/ | | | actor_loss | -5.48 | | critic_loss | 0.00186 | | ent_coef | 0.0801 | | ent_coef_loss | -4.1 | | learning_rate | 0.0003 | | n_updates | 8387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.71 | | time/ | | | episodes | 280 | | fps | 166 | | time_elapsed | 403 | | total_timesteps | 67200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.72 | | time/ | | | episodes | 284 | | fps | 168 | | time_elapsed | 409 | | total_timesteps | 69120 | | train/ | | | actor_loss | -5.46 | | critic_loss | 0.000673 | | ent_coef | 0.0745 | | ent_coef_loss | -4.28 | | learning_rate | 0.0003 | | n_updates | 8627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.72 | | time/ | | | episodes | 288 | | fps | 168 | | time_elapsed | 409 | | total_timesteps | 69120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 292 | | fps | 170 | | time_elapsed | 415 | | total_timesteps | 71040 | | train/ | | | actor_loss | -5.42 | | critic_loss | 0.00792 | | ent_coef | 0.0694 | | ent_coef_loss | -4.44 | | learning_rate | 0.0003 | | n_updates | 8867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 296 | | fps | 170 | | time_elapsed | 415 | | total_timesteps | 71040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 300 | | fps | 172 | | time_elapsed | 422 | | total_timesteps | 72960 | | train/ | | | actor_loss | -5.36 | | critic_loss | 0.000957 | | ent_coef | 0.0646 | | ent_coef_loss | -4.49 | | learning_rate | 0.0003 | | n_updates | 9107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 304 | | fps | 172 | | time_elapsed | 422 | | total_timesteps | 72960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 308 | | fps | 174 | | time_elapsed | 429 | | total_timesteps | 74880 | | train/ | | | actor_loss | -5.3 | | critic_loss | 0.0058 | | ent_coef | 0.0601 | | ent_coef_loss | -4.67 | | learning_rate | 0.0003 | | n_updates | 9347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 312 | | fps | 174 | | time_elapsed | 429 | | total_timesteps | 74880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.66 | | time/ | | | episodes | 316 | | fps | 176 | | time_elapsed | 435 | | total_timesteps | 76800 | | train/ | | | actor_loss | -5.27 | | critic_loss | 0.0018 | | ent_coef | 0.0559 | | ent_coef_loss | -4.65 | | learning_rate | 0.0003 | | n_updates | 9587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.66 | | time/ | | | episodes | 320 | | fps | 176 | | time_elapsed | 435 | | total_timesteps | 76800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 324 | | fps | 178 | | time_elapsed | 442 | | total_timesteps | 78720 | | train/ | | | actor_loss | -5.25 | | critic_loss | 0.00153 | | ent_coef | 0.052 | | ent_coef_loss | -4.82 | | learning_rate | 0.0003 | | n_updates | 9827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 328 | | fps | 178 | | time_elapsed | 442 | | total_timesteps | 78720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 332 | | fps | 179 | | time_elapsed | 448 | | total_timesteps | 80640 | | train/ | | | actor_loss | -5.21 | | critic_loss | 0.000667 | | ent_coef | 0.0484 | | ent_coef_loss | -4.96 | | learning_rate | 0.0003 | | n_updates | 10067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 336 | | fps | 179 | | time_elapsed | 448 | | total_timesteps | 80640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 340 | | fps | 181 | | time_elapsed | 454 | | total_timesteps | 82560 | | train/ | | | actor_loss | -5.21 | | critic_loss | 0.00375 | | ent_coef | 0.0451 | | ent_coef_loss | -5.13 | | learning_rate | 0.0003 | | n_updates | 10307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.69 | | time/ | | | episodes | 344 | | fps | 181 | | time_elapsed | 454 | | total_timesteps | 82560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.66 | | time/ | | | episodes | 348 | | fps | 183 | | time_elapsed | 460 | | total_timesteps | 84480 | | train/ | | | actor_loss | -5.15 | | critic_loss | 0.000259 | | ent_coef | 0.0419 | | ent_coef_loss | -5.26 | | learning_rate | 0.0003 | | n_updates | 10547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.66 | | time/ | | | episodes | 352 | | fps | 183 | | time_elapsed | 460 | | total_timesteps | 84480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 356 | | fps | 184 | | time_elapsed | 467 | | total_timesteps | 86400 | | train/ | | | actor_loss | -5.11 | | critic_loss | 0.000264 | | ent_coef | 0.039 | | ent_coef_loss | -5.29 | | learning_rate | 0.0003 | | n_updates | 10787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 360 | | fps | 184 | | time_elapsed | 467 | | total_timesteps | 86400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 364 | | fps | 186 | | time_elapsed | 473 | | total_timesteps | 88320 | | train/ | | | actor_loss | -4.98 | | critic_loss | 0.00584 | | ent_coef | 0.0363 | | ent_coef_loss | -5.49 | | learning_rate | 0.0003 | | n_updates | 11027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.68 | | time/ | | | episodes | 368 | | fps | 186 | | time_elapsed | 473 | | total_timesteps | 88320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 372 | | fps | 188 | | time_elapsed | 479 | | total_timesteps | 90240 | | train/ | | | actor_loss | -4.95 | | critic_loss | 0.00339 | | ent_coef | 0.0338 | | ent_coef_loss | -5.59 | | learning_rate | 0.0003 | | n_updates | 11267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 376 | | fps | 188 | | time_elapsed | 479 | | total_timesteps | 90240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 380 | | fps | 189 | | time_elapsed | 486 | | total_timesteps | 92160 | | train/ | | | actor_loss | -4.95 | | critic_loss | 0.00048 | | ent_coef | 0.0314 | | ent_coef_loss | -5.51 | | learning_rate | 0.0003 | | n_updates | 11507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 384 | | fps | 189 | | time_elapsed | 486 | | total_timesteps | 92160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 388 | | fps | 190 | | time_elapsed | 494 | | total_timesteps | 94080 | | train/ | | | actor_loss | -4.92 | | critic_loss | 0.000432 | | ent_coef | 0.0293 | | ent_coef_loss | -5.72 | | learning_rate | 0.0003 | | n_updates | 11747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 392 | | fps | 190 | | time_elapsed | 494 | | total_timesteps | 94080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 396 | | fps | 190 | | time_elapsed | 502 | | total_timesteps | 96000 | | train/ | | | actor_loss | -4.86 | | critic_loss | 0.00018 | | ent_coef | 0.0272 | | ent_coef_loss | -5.83 | | learning_rate | 0.0003 | | n_updates | 11987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.65 | | time/ | | | episodes | 400 | | fps | 190 | | time_elapsed | 502 | | total_timesteps | 96000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.63 | | time/ | | | episodes | 404 | | fps | 191 | | time_elapsed | 510 | | total_timesteps | 97920 | | train/ | | | actor_loss | -4.81 | | critic_loss | 0.000426 | | ent_coef | 0.0254 | | ent_coef_loss | -5.89 | | learning_rate | 0.0003 | | n_updates | 12227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.63 | | time/ | | | episodes | 408 | | fps | 191 | | time_elapsed | 510 | | total_timesteps | 97920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 412 | | fps | 192 | | time_elapsed | 517 | | total_timesteps | 99840 | | train/ | | | actor_loss | -4.76 | | critic_loss | 0.00282 | | ent_coef | 0.0236 | | ent_coef_loss | -6.06 | | learning_rate | 0.0003 | | n_updates | 12467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 416 | | fps | 192 | | time_elapsed | 517 | | total_timesteps | 99840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 420 | | fps | 194 | | time_elapsed | 524 | | total_timesteps | 101760 | | train/ | | | actor_loss | -4.71 | | critic_loss | 0.000315 | | ent_coef | 0.022 | | ent_coef_loss | -6.27 | | learning_rate | 0.0003 | | n_updates | 12707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.67 | | time/ | | | episodes | 424 | | fps | 194 | | time_elapsed | 524 | | total_timesteps | 101760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.64 | | time/ | | | episodes | 428 | | fps | 195 | | time_elapsed | 531 | | total_timesteps | 103680 | | train/ | | | actor_loss | -4.62 | | critic_loss | 0.0102 | | ent_coef | 0.0205 | | ent_coef_loss | -6.33 | | learning_rate | 0.0003 | | n_updates | 12947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.64 | | time/ | | | episodes | 432 | | fps | 195 | | time_elapsed | 531 | | total_timesteps | 103680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.62 | | time/ | | | episodes | 436 | | fps | 195 | | time_elapsed | 539 | | total_timesteps | 105600 | | train/ | | | actor_loss | -4.6 | | critic_loss | 0.000208 | | ent_coef | 0.0191 | | ent_coef_loss | -6.36 | | learning_rate | 0.0003 | | n_updates | 13187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.62 | | time/ | | | episodes | 440 | | fps | 195 | | time_elapsed | 539 | | total_timesteps | 105600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.6 | | time/ | | | episodes | 444 | | fps | 196 | | time_elapsed | 547 | | total_timesteps | 107520 | | train/ | | | actor_loss | -4.54 | | critic_loss | 0.000192 | | ent_coef | 0.0177 | | ent_coef_loss | -6.49 | | learning_rate | 0.0003 | | n_updates | 13427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.6 | | time/ | | | episodes | 448 | | fps | 196 | | time_elapsed | 547 | | total_timesteps | 107520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.59 | | time/ | | | episodes | 452 | | fps | 196 | | time_elapsed | 556 | | total_timesteps | 109440 | | train/ | | | actor_loss | -4.5 | | critic_loss | 0.0029 | | ent_coef | 0.0165 | | ent_coef_loss | -6.55 | | learning_rate | 0.0003 | | n_updates | 13667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.59 | | time/ | | | episodes | 456 | | fps | 196 | | time_elapsed | 556 | | total_timesteps | 109440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.58 | | time/ | | | episodes | 460 | | fps | 196 | | time_elapsed | 566 | | total_timesteps | 111360 | | train/ | | | actor_loss | -4.42 | | critic_loss | 0.00189 | | ent_coef | 0.0154 | | ent_coef_loss | -6.66 | | learning_rate | 0.0003 | | n_updates | 13907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.58 | | time/ | | | episodes | 464 | | fps | 196 | | time_elapsed | 566 | | total_timesteps | 111360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.52 | | time/ | | | episodes | 468 | | fps | 196 | | time_elapsed | 575 | | total_timesteps | 113280 | | train/ | | | actor_loss | -4.4 | | critic_loss | 0.000233 | | ent_coef | 0.0143 | | ent_coef_loss | -6.8 | | learning_rate | 0.0003 | | n_updates | 14147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.52 | | time/ | | | episodes | 472 | | fps | 196 | | time_elapsed | 575 | | total_timesteps | 113280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.52 | | time/ | | | episodes | 476 | | fps | 197 | | time_elapsed | 584 | | total_timesteps | 115200 | | train/ | | | actor_loss | -4.34 | | critic_loss | 0.000121 | | ent_coef | 0.0133 | | ent_coef_loss | -6.9 | | learning_rate | 0.0003 | | n_updates | 14387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.52 | | time/ | | | episodes | 480 | | fps | 197 | | time_elapsed | 584 | | total_timesteps | 115200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.5 | | time/ | | | episodes | 484 | | fps | 197 | | time_elapsed | 592 | | total_timesteps | 117120 | | train/ | | | actor_loss | -4.27 | | critic_loss | 0.000476 | | ent_coef | 0.0124 | | ent_coef_loss | -6.9 | | learning_rate | 0.0003 | | n_updates | 14627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.5 | | time/ | | | episodes | 488 | | fps | 197 | | time_elapsed | 592 | | total_timesteps | 117120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.49 | | time/ | | | episodes | 492 | | fps | 198 | | time_elapsed | 600 | | total_timesteps | 119040 | | train/ | | | actor_loss | -4.22 | | critic_loss | 0.00155 | | ent_coef | 0.0116 | | ent_coef_loss | -7.05 | | learning_rate | 0.0003 | | n_updates | 14867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.49 | | time/ | | | episodes | 496 | | fps | 198 | | time_elapsed | 600 | | total_timesteps | 119040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.51 | | time/ | | | episodes | 500 | | fps | 198 | | time_elapsed | 608 | | total_timesteps | 120960 | | train/ | | | actor_loss | -4.19 | | critic_loss | 0.000329 | | ent_coef | 0.0108 | | ent_coef_loss | -6.91 | | learning_rate | 0.0003 | | n_updates | 15107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.51 | | time/ | | | episodes | 504 | | fps | 198 | | time_elapsed | 608 | | total_timesteps | 120960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.48 | | time/ | | | episodes | 508 | | fps | 199 | | time_elapsed | 616 | | total_timesteps | 122880 | | train/ | | | actor_loss | -4.13 | | critic_loss | 0.000231 | | ent_coef | 0.0101 | | ent_coef_loss | -6.97 | | learning_rate | 0.0003 | | n_updates | 15347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.48 | | time/ | | | episodes | 512 | | fps | 199 | | time_elapsed | 616 | | total_timesteps | 122880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.45 | | time/ | | | episodes | 516 | | fps | 199 | | time_elapsed | 624 | | total_timesteps | 124800 | | train/ | | | actor_loss | -4.08 | | critic_loss | 0.000401 | | ent_coef | 0.00939 | | ent_coef_loss | -6.95 | | learning_rate | 0.0003 | | n_updates | 15587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.45 | | time/ | | | episodes | 520 | | fps | 199 | | time_elapsed | 624 | | total_timesteps | 124800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.43 | | time/ | | | episodes | 524 | | fps | 200 | | time_elapsed | 632 | | total_timesteps | 126720 | | train/ | | | actor_loss | -4.01 | | critic_loss | 0.000151 | | ent_coef | 0.00875 | | ent_coef_loss | -7.14 | | learning_rate | 0.0003 | | n_updates | 15827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.43 | | time/ | | | episodes | 528 | | fps | 200 | | time_elapsed | 632 | | total_timesteps | 126720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.41 | | time/ | | | episodes | 532 | | fps | 200 | | time_elapsed | 640 | | total_timesteps | 128640 | | train/ | | | actor_loss | -3.97 | | critic_loss | 0.000531 | | ent_coef | 0.00815 | | ent_coef_loss | -7.45 | | learning_rate | 0.0003 | | n_updates | 16067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.41 | | time/ | | | episodes | 536 | | fps | 200 | | time_elapsed | 640 | | total_timesteps | 128640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.41 | | time/ | | | episodes | 540 | | fps | 201 | | time_elapsed | 649 | | total_timesteps | 130560 | | train/ | | | actor_loss | -3.92 | | critic_loss | 0.000504 | | ent_coef | 0.0076 | | ent_coef_loss | -7.48 | | learning_rate | 0.0003 | | n_updates | 16307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.41 | | time/ | | | episodes | 544 | | fps | 201 | | time_elapsed | 649 | | total_timesteps | 130560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.41 | | time/ | | | episodes | 548 | | fps | 201 | | time_elapsed | 657 | | total_timesteps | 132480 | | train/ | | | actor_loss | -3.88 | | critic_loss | 0.000214 | | ent_coef | 0.00708 | | ent_coef_loss | -7.53 | | learning_rate | 0.0003 | | n_updates | 16547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.41 | | time/ | | | episodes | 552 | | fps | 201 | | time_elapsed | 657 | | total_timesteps | 132480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.4 | | time/ | | | episodes | 556 | | fps | 201 | | time_elapsed | 666 | | total_timesteps | 134400 | | train/ | | | actor_loss | -3.83 | | critic_loss | 0.000813 | | ent_coef | 0.00661 | | ent_coef_loss | -6.56 | | learning_rate | 0.0003 | | n_updates | 16787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.4 | | time/ | | | episodes | 560 | | fps | 201 | | time_elapsed | 666 | | total_timesteps | 134400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.4 | | time/ | | | episodes | 564 | | fps | 201 | | time_elapsed | 675 | | total_timesteps | 136320 | | train/ | | | actor_loss | -3.76 | | critic_loss | 0.00046 | | ent_coef | 0.00618 | | ent_coef_loss | -6.6 | | learning_rate | 0.0003 | | n_updates | 17027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.4 | | time/ | | | episodes | 568 | | fps | 201 | | time_elapsed | 675 | | total_timesteps | 136320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.38 | | time/ | | | episodes | 572 | | fps | 202 | | time_elapsed | 683 | | total_timesteps | 138240 | | train/ | | | actor_loss | -3.71 | | critic_loss | 0.000848 | | ent_coef | 0.00578 | | ent_coef_loss | -7.51 | | learning_rate | 0.0003 | | n_updates | 17267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.38 | | time/ | | | episodes | 576 | | fps | 202 | | time_elapsed | 683 | | total_timesteps | 138240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.36 | | time/ | | | episodes | 580 | | fps | 202 | | time_elapsed | 690 | | total_timesteps | 140160 | | train/ | | | actor_loss | -3.67 | | critic_loss | 0.000322 | | ent_coef | 0.00539 | | ent_coef_loss | -7.13 | | learning_rate | 0.0003 | | n_updates | 17507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.36 | | time/ | | | episodes | 584 | | fps | 202 | | time_elapsed | 690 | | total_timesteps | 140160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.34 | | time/ | | | episodes | 588 | | fps | 203 | | time_elapsed | 699 | | total_timesteps | 142080 | | train/ | | | actor_loss | -3.62 | | critic_loss | 0.000638 | | ent_coef | 0.00503 | | ent_coef_loss | -6.8 | | learning_rate | 0.0003 | | n_updates | 17747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.34 | | time/ | | | episodes | 592 | | fps | 203 | | time_elapsed | 699 | | total_timesteps | 142080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.33 | | time/ | | | episodes | 596 | | fps | 203 | | time_elapsed | 707 | | total_timesteps | 144000 | | train/ | | | actor_loss | -3.57 | | critic_loss | 0.000322 | | ent_coef | 0.0047 | | ent_coef_loss | -7.15 | | learning_rate | 0.0003 | | n_updates | 17987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.33 | | time/ | | | episodes | 600 | | fps | 203 | | time_elapsed | 707 | | total_timesteps | 144000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.33 | | time/ | | | episodes | 604 | | fps | 204 | | time_elapsed | 714 | | total_timesteps | 145920 | | train/ | | | actor_loss | -3.52 | | critic_loss | 0.000115 | | ent_coef | 0.00439 | | ent_coef_loss | -7.48 | | learning_rate | 0.0003 | | n_updates | 18227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.33 | | time/ | | | episodes | 608 | | fps | 204 | | time_elapsed | 714 | | total_timesteps | 145920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.31 | | time/ | | | episodes | 612 | | fps | 204 | | time_elapsed | 721 | | total_timesteps | 147840 | | train/ | | | actor_loss | -3.47 | | critic_loss | 0.000181 | | ent_coef | 0.00409 | | ent_coef_loss | -7.83 | | learning_rate | 0.0003 | | n_updates | 18467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.31 | | time/ | | | episodes | 616 | | fps | 204 | | time_elapsed | 721 | | total_timesteps | 147840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.3 | | time/ | | | episodes | 620 | | fps | 205 | | time_elapsed | 730 | | total_timesteps | 149760 | | train/ | | | actor_loss | -3.43 | | critic_loss | 0.000788 | | ent_coef | 0.00382 | | ent_coef_loss | -7.5 | | learning_rate | 0.0003 | | n_updates | 18707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.3 | | time/ | | | episodes | 624 | | fps | 205 | | time_elapsed | 730 | | total_timesteps | 149760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.29 | | time/ | | | episodes | 628 | | fps | 204 | | time_elapsed | 740 | | total_timesteps | 151680 | | train/ | | | actor_loss | -3.39 | | critic_loss | 0.000237 | | ent_coef | 0.00356 | | ent_coef_loss | -7.82 | | learning_rate | 0.0003 | | n_updates | 18947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.29 | | time/ | | | episodes | 632 | | fps | 204 | | time_elapsed | 740 | | total_timesteps | 151680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.28 | | time/ | | | episodes | 636 | | fps | 205 | | time_elapsed | 748 | | total_timesteps | 153600 | | train/ | | | actor_loss | -3.35 | | critic_loss | 0.00076 | | ent_coef | 0.00332 | | ent_coef_loss | -7.44 | | learning_rate | 0.0003 | | n_updates | 19187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.28 | | time/ | | | episodes | 640 | | fps | 205 | | time_elapsed | 748 | | total_timesteps | 153600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.26 | | time/ | | | episodes | 644 | | fps | 205 | | time_elapsed | 756 | | total_timesteps | 155520 | | train/ | | | actor_loss | -3.29 | | critic_loss | 0.000223 | | ent_coef | 0.0031 | | ent_coef_loss | -7.44 | | learning_rate | 0.0003 | | n_updates | 19427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.26 | | time/ | | | episodes | 648 | | fps | 205 | | time_elapsed | 756 | | total_timesteps | 155520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.23 | | time/ | | | episodes | 652 | | fps | 205 | | time_elapsed | 766 | | total_timesteps | 157440 | | train/ | | | actor_loss | -3.25 | | critic_loss | 0.000113 | | ent_coef | 0.0029 | | ent_coef_loss | -6.65 | | learning_rate | 0.0003 | | n_updates | 19667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.23 | | time/ | | | episodes | 656 | | fps | 205 | | time_elapsed | 766 | | total_timesteps | 157440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.2 | | time/ | | | episodes | 660 | | fps | 205 | | time_elapsed | 773 | | total_timesteps | 159360 | | train/ | | | actor_loss | -3.2 | | critic_loss | 0.000211 | | ent_coef | 0.00272 | | ent_coef_loss | -7.51 | | learning_rate | 0.0003 | | n_updates | 19907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.2 | | time/ | | | episodes | 664 | | fps | 205 | | time_elapsed | 773 | | total_timesteps | 159360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.19 | | time/ | | | episodes | 668 | | fps | 206 | | time_elapsed | 780 | | total_timesteps | 161280 | | train/ | | | actor_loss | -3.16 | | critic_loss | 0.00015 | | ent_coef | 0.00254 | | ent_coef_loss | -6.74 | | learning_rate | 0.0003 | | n_updates | 20147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.19 | | time/ | | | episodes | 672 | | fps | 206 | | time_elapsed | 780 | | total_timesteps | 161280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.18 | | time/ | | | episodes | 676 | | fps | 207 | | time_elapsed | 786 | | total_timesteps | 163200 | | train/ | | | actor_loss | -3.09 | | critic_loss | 0.000908 | | ent_coef | 0.00238 | | ent_coef_loss | -7.36 | | learning_rate | 0.0003 | | n_updates | 20387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.18 | | time/ | | | episodes | 680 | | fps | 207 | | time_elapsed | 786 | | total_timesteps | 163200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.15 | | time/ | | | episodes | 684 | | fps | 208 | | time_elapsed | 793 | | total_timesteps | 165120 | | train/ | | | actor_loss | -3.07 | | critic_loss | 0.000178 | | ent_coef | 0.00223 | | ent_coef_loss | -4.22 | | learning_rate | 0.0003 | | n_updates | 20627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.15 | | time/ | | | episodes | 688 | | fps | 208 | | time_elapsed | 793 | | total_timesteps | 165120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.13 | | time/ | | | episodes | 692 | | fps | 208 | | time_elapsed | 801 | | total_timesteps | 167040 | | train/ | | | actor_loss | -3.02 | | critic_loss | 0.000184 | | ent_coef | 0.00213 | | ent_coef_loss | -5.1 | | learning_rate | 0.0003 | | n_updates | 20867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.13 | | time/ | | | episodes | 696 | | fps | 208 | | time_elapsed | 801 | | total_timesteps | 167040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.11 | | time/ | | | episodes | 700 | | fps | 209 | | time_elapsed | 807 | | total_timesteps | 168960 | | train/ | | | actor_loss | -2.98 | | critic_loss | 0.000333 | | ent_coef | 0.00202 | | ent_coef_loss | -5.28 | | learning_rate | 0.0003 | | n_updates | 21107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.11 | | time/ | | | episodes | 704 | | fps | 209 | | time_elapsed | 807 | | total_timesteps | 168960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.08 | | time/ | | | episodes | 708 | | fps | 209 | | time_elapsed | 815 | | total_timesteps | 170880 | | train/ | | | actor_loss | -2.94 | | critic_loss | 0.000182 | | ent_coef | 0.00189 | | ent_coef_loss | -6.4 | | learning_rate | 0.0003 | | n_updates | 21347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.08 | | time/ | | | episodes | 712 | | fps | 209 | | time_elapsed | 815 | | total_timesteps | 170880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.04 | | time/ | | | episodes | 716 | | fps | 210 | | time_elapsed | 822 | | total_timesteps | 172800 | | train/ | | | actor_loss | -2.9 | | critic_loss | 0.000238 | | ent_coef | 0.00177 | | ent_coef_loss | -5.59 | | learning_rate | 0.0003 | | n_updates | 21587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.04 | | time/ | | | episodes | 720 | | fps | 210 | | time_elapsed | 822 | | total_timesteps | 172800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.01 | | time/ | | | episodes | 724 | | fps | 210 | | time_elapsed | 828 | | total_timesteps | 174720 | | train/ | | | actor_loss | -2.86 | | critic_loss | 0.000349 | | ent_coef | 0.00165 | | ent_coef_loss | -6.11 | | learning_rate | 0.0003 | | n_updates | 21827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1.01 | | time/ | | | episodes | 728 | | fps | 210 | | time_elapsed | 828 | | total_timesteps | 174720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1 | | time/ | | | episodes | 732 | | fps | 211 | | time_elapsed | 835 | | total_timesteps | 176640 | | train/ | | | actor_loss | -2.83 | | critic_loss | 0.000119 | | ent_coef | 0.00154 | | ent_coef_loss | -5.74 | | learning_rate | 0.0003 | | n_updates | 22067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -1 | | time/ | | | episodes | 736 | | fps | 211 | | time_elapsed | 835 | | total_timesteps | 176640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.974 | | time/ | | | episodes | 740 | | fps | 211 | | time_elapsed | 842 | | total_timesteps | 178560 | | train/ | | | actor_loss | -2.76 | | critic_loss | 0.00093 | | ent_coef | 0.00144 | | ent_coef_loss | -5.75 | | learning_rate | 0.0003 | | n_updates | 22307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.974 | | time/ | | | episodes | 744 | | fps | 211 | | time_elapsed | 842 | | total_timesteps | 178560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.948 | | time/ | | | episodes | 748 | | fps | 212 | | time_elapsed | 850 | | total_timesteps | 180480 | | train/ | | | actor_loss | -2.75 | | critic_loss | 0.000382 | | ent_coef | 0.00134 | | ent_coef_loss | -6.73 | | learning_rate | 0.0003 | | n_updates | 22547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.948 | | time/ | | | episodes | 752 | | fps | 212 | | time_elapsed | 850 | | total_timesteps | 180480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.926 | | time/ | | | episodes | 756 | | fps | 212 | | time_elapsed | 857 | | total_timesteps | 182400 | | train/ | | | actor_loss | -2.7 | | critic_loss | 0.00105 | | ent_coef | 0.00126 | | ent_coef_loss | -6.14 | | learning_rate | 0.0003 | | n_updates | 22787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.926 | | time/ | | | episodes | 760 | | fps | 212 | | time_elapsed | 857 | | total_timesteps | 182400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.887 | | time/ | | | episodes | 764 | | fps | 213 | | time_elapsed | 863 | | total_timesteps | 184320 | | train/ | | | actor_loss | -2.66 | | critic_loss | 0.000415 | | ent_coef | 0.00118 | | ent_coef_loss | -5.24 | | learning_rate | 0.0003 | | n_updates | 23027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.887 | | time/ | | | episodes | 768 | | fps | 213 | | time_elapsed | 863 | | total_timesteps | 184320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.866 | | time/ | | | episodes | 772 | | fps | 213 | | time_elapsed | 870 | | total_timesteps | 186240 | | train/ | | | actor_loss | -2.64 | | critic_loss | 0.000142 | | ent_coef | 0.00111 | | ent_coef_loss | -5.55 | | learning_rate | 0.0003 | | n_updates | 23267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.866 | | time/ | | | episodes | 776 | | fps | 213 | | time_elapsed | 870 | | total_timesteps | 186240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.834 | | time/ | | | episodes | 780 | | fps | 214 | | time_elapsed | 876 | | total_timesteps | 188160 | | train/ | | | actor_loss | -2.59 | | critic_loss | 0.000141 | | ent_coef | 0.00105 | | ent_coef_loss | -5.35 | | learning_rate | 0.0003 | | n_updates | 23507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.834 | | time/ | | | episodes | 784 | | fps | 214 | | time_elapsed | 876 | | total_timesteps | 188160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.821 | | time/ | | | episodes | 788 | | fps | 214 | | time_elapsed | 884 | | total_timesteps | 190080 | | train/ | | | actor_loss | -2.57 | | critic_loss | 0.000169 | | ent_coef | 0.000988 | | ent_coef_loss | -5.11 | | learning_rate | 0.0003 | | n_updates | 23747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.821 | | time/ | | | episodes | 792 | | fps | 214 | | time_elapsed | 884 | | total_timesteps | 190080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.804 | | time/ | | | episodes | 796 | | fps | 215 | | time_elapsed | 890 | | total_timesteps | 192000 | | train/ | | | actor_loss | -2.53 | | critic_loss | 0.000137 | | ent_coef | 0.000932 | | ent_coef_loss | -4.63 | | learning_rate | 0.0003 | | n_updates | 23987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.804 | | time/ | | | episodes | 800 | | fps | 215 | | time_elapsed | 890 | | total_timesteps | 192000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.766 | | time/ | | | episodes | 804 | | fps | 216 | | time_elapsed | 896 | | total_timesteps | 193920 | | train/ | | | actor_loss | -2.49 | | critic_loss | 0.000516 | | ent_coef | 0.000882 | | ent_coef_loss | -4.49 | | learning_rate | 0.0003 | | n_updates | 24227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.766 | | time/ | | | episodes | 808 | | fps | 216 | | time_elapsed | 896 | | total_timesteps | 193920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.75 | | time/ | | | episodes | 812 | | fps | 216 | | time_elapsed | 903 | | total_timesteps | 195840 | | train/ | | | actor_loss | -2.46 | | critic_loss | 0.000126 | | ent_coef | 0.000832 | | ent_coef_loss | -4.5 | | learning_rate | 0.0003 | | n_updates | 24467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.75 | | time/ | | | episodes | 816 | | fps | 216 | | time_elapsed | 903 | | total_timesteps | 195840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.744 | | time/ | | | episodes | 820 | | fps | 217 | | time_elapsed | 909 | | total_timesteps | 197760 | | train/ | | | actor_loss | -2.42 | | critic_loss | 0.000132 | | ent_coef | 0.000784 | | ent_coef_loss | -4.07 | | learning_rate | 0.0003 | | n_updates | 24707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.744 | | time/ | | | episodes | 824 | | fps | 217 | | time_elapsed | 909 | | total_timesteps | 197760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.718 | | time/ | | | episodes | 828 | | fps | 217 | | time_elapsed | 916 | | total_timesteps | 199680 | | train/ | | | actor_loss | -2.39 | | critic_loss | 0.000302 | | ent_coef | 0.000739 | | ent_coef_loss | -3.89 | | learning_rate | 0.0003 | | n_updates | 24947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.718 | | time/ | | | episodes | 832 | | fps | 217 | | time_elapsed | 916 | | total_timesteps | 199680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.698 | | time/ | | | episodes | 836 | | fps | 218 | | time_elapsed | 922 | | total_timesteps | 201600 | | train/ | | | actor_loss | -2.35 | | critic_loss | 0.000462 | | ent_coef | 0.000699 | | ent_coef_loss | -2.07 | | learning_rate | 0.0003 | | n_updates | 25187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.698 | | time/ | | | episodes | 840 | | fps | 218 | | time_elapsed | 922 | | total_timesteps | 201600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 844 | | fps | 218 | | time_elapsed | 930 | | total_timesteps | 203520 | | train/ | | | actor_loss | -2.32 | | critic_loss | 0.000811 | | ent_coef | 0.000661 | | ent_coef_loss | -3.51 | | learning_rate | 0.0003 | | n_updates | 25427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 848 | | fps | 218 | | time_elapsed | 930 | | total_timesteps | 203520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.677 | | time/ | | | episodes | 852 | | fps | 218 | | time_elapsed | 939 | | total_timesteps | 205440 | | train/ | | | actor_loss | -2.28 | | critic_loss | 0.00013 | | ent_coef | 0.000627 | | ent_coef_loss | -1.9 | | learning_rate | 0.0003 | | n_updates | 25667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.677 | | time/ | | | episodes | 856 | | fps | 218 | | time_elapsed | 939 | | total_timesteps | 205440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.659 | | time/ | | | episodes | 860 | | fps | 218 | | time_elapsed | 948 | | total_timesteps | 207360 | | train/ | | | actor_loss | -2.24 | | critic_loss | 0.00023 | | ent_coef | 0.000599 | | ent_coef_loss | -2.45 | | learning_rate | 0.0003 | | n_updates | 25907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.659 | | time/ | | | episodes | 864 | | fps | 218 | | time_elapsed | 948 | | total_timesteps | 207360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.652 | | time/ | | | episodes | 868 | | fps | 218 | | time_elapsed | 956 | | total_timesteps | 209280 | | train/ | | | actor_loss | -2.22 | | critic_loss | 0.00011 | | ent_coef | 0.000573 | | ent_coef_loss | -2.88 | | learning_rate | 0.0003 | | n_updates | 26147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.652 | | time/ | | | episodes | 872 | | fps | 218 | | time_elapsed | 956 | | total_timesteps | 209280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.644 | | time/ | | | episodes | 876 | | fps | 218 | | time_elapsed | 964 | | total_timesteps | 211200 | | train/ | | | actor_loss | -2.19 | | critic_loss | 0.000159 | | ent_coef | 0.000546 | | ent_coef_loss | -2.01 | | learning_rate | 0.0003 | | n_updates | 26387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.644 | | time/ | | | episodes | 880 | | fps | 218 | | time_elapsed | 964 | | total_timesteps | 211200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.642 | | time/ | | | episodes | 884 | | fps | 219 | | time_elapsed | 969 | | total_timesteps | 213120 | | train/ | | | actor_loss | -2.17 | | critic_loss | 0.000353 | | ent_coef | 0.00052 | | ent_coef_loss | -1.89 | | learning_rate | 0.0003 | | n_updates | 26627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.642 | | time/ | | | episodes | 888 | | fps | 219 | | time_elapsed | 969 | | total_timesteps | 213120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.632 | | time/ | | | episodes | 892 | | fps | 220 | | time_elapsed | 975 | | total_timesteps | 215040 | | train/ | | | actor_loss | -2.12 | | critic_loss | 0.000202 | | ent_coef | 0.000497 | | ent_coef_loss | -0.648 | | learning_rate | 0.0003 | | n_updates | 26867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.632 | | time/ | | | episodes | 896 | | fps | 220 | | time_elapsed | 975 | | total_timesteps | 215040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.638 | | time/ | | | episodes | 900 | | fps | 220 | | time_elapsed | 982 | | total_timesteps | 216960 | | train/ | | | actor_loss | -2.1 | | critic_loss | 0.000139 | | ent_coef | 0.000476 | | ent_coef_loss | -1.49 | | learning_rate | 0.0003 | | n_updates | 27107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.638 | | time/ | | | episodes | 904 | | fps | 220 | | time_elapsed | 982 | | total_timesteps | 216960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.647 | | time/ | | | episodes | 908 | | fps | 221 | | time_elapsed | 990 | | total_timesteps | 218880 | | train/ | | | actor_loss | -2.06 | | critic_loss | 0.000129 | | ent_coef | 0.000459 | | ent_coef_loss | -1.32 | | learning_rate | 0.0003 | | n_updates | 27347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.647 | | time/ | | | episodes | 912 | | fps | 221 | | time_elapsed | 990 | | total_timesteps | 218880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.626 | | time/ | | | episodes | 916 | | fps | 221 | | time_elapsed | 998 | | total_timesteps | 220800 | | train/ | | | actor_loss | -2.04 | | critic_loss | 0.000146 | | ent_coef | 0.000443 | | ent_coef_loss | -0.663 | | learning_rate | 0.0003 | | n_updates | 27587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.626 | | time/ | | | episodes | 920 | | fps | 221 | | time_elapsed | 998 | | total_timesteps | 220800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 924 | | fps | 221 | | time_elapsed | 1006 | | total_timesteps | 222720 | | train/ | | | actor_loss | -2.01 | | critic_loss | 0.000143 | | ent_coef | 0.000427 | | ent_coef_loss | -1.93 | | learning_rate | 0.0003 | | n_updates | 27827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 928 | | fps | 221 | | time_elapsed | 1006 | | total_timesteps | 222720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.635 | | time/ | | | episodes | 932 | | fps | 221 | | time_elapsed | 1014 | | total_timesteps | 224640 | | train/ | | | actor_loss | -1.98 | | critic_loss | 0.000162 | | ent_coef | 0.000414 | | ent_coef_loss | -0.58 | | learning_rate | 0.0003 | | n_updates | 28067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.635 | | time/ | | | episodes | 936 | | fps | 221 | | time_elapsed | 1014 | | total_timesteps | 224640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.624 | | time/ | | | episodes | 940 | | fps | 221 | | time_elapsed | 1022 | | total_timesteps | 226560 | | train/ | | | actor_loss | -1.95 | | critic_loss | 0.00107 | | ent_coef | 0.0004 | | ent_coef_loss | -1.88 | | learning_rate | 0.0003 | | n_updates | 28307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.624 | | time/ | | | episodes | 944 | | fps | 221 | | time_elapsed | 1022 | | total_timesteps | 226560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.617 | | time/ | | | episodes | 948 | | fps | 221 | | time_elapsed | 1031 | | total_timesteps | 228480 | | train/ | | | actor_loss | -1.93 | | critic_loss | 0.00048 | | ent_coef | 0.000388 | | ent_coef_loss | 0.723 | | learning_rate | 0.0003 | | n_updates | 28547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.617 | | time/ | | | episodes | 952 | | fps | 221 | | time_elapsed | 1031 | | total_timesteps | 228480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.622 | | time/ | | | episodes | 956 | | fps | 222 | | time_elapsed | 1037 | | total_timesteps | 230400 | | train/ | | | actor_loss | -1.91 | | critic_loss | 0.000223 | | ent_coef | 0.000382 | | ent_coef_loss | -0.965 | | learning_rate | 0.0003 | | n_updates | 28787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.622 | | time/ | | | episodes | 960 | | fps | 222 | | time_elapsed | 1037 | | total_timesteps | 230400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 964 | | fps | 222 | | time_elapsed | 1043 | | total_timesteps | 232320 | | train/ | | | actor_loss | -1.88 | | critic_loss | 0.000165 | | ent_coef | 0.000371 | | ent_coef_loss | -0.373 | | learning_rate | 0.0003 | | n_updates | 29027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 968 | | fps | 222 | | time_elapsed | 1043 | | total_timesteps | 232320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.64 | | time/ | | | episodes | 972 | | fps | 223 | | time_elapsed | 1048 | | total_timesteps | 234240 | | train/ | | | actor_loss | -1.85 | | critic_loss | 0.000381 | | ent_coef | 0.00036 | | ent_coef_loss | -0.816 | | learning_rate | 0.0003 | | n_updates | 29267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.64 | | time/ | | | episodes | 976 | | fps | 223 | | time_elapsed | 1048 | | total_timesteps | 234240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.619 | | time/ | | | episodes | 980 | | fps | 223 | | time_elapsed | 1057 | | total_timesteps | 236160 | | train/ | | | actor_loss | -1.82 | | critic_loss | 0.000243 | | ent_coef | 0.000359 | | ent_coef_loss | -0.813 | | learning_rate | 0.0003 | | n_updates | 29507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.619 | | time/ | | | episodes | 984 | | fps | 223 | | time_elapsed | 1057 | | total_timesteps | 236160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.622 | | time/ | | | episodes | 988 | | fps | 223 | | time_elapsed | 1067 | | total_timesteps | 238080 | | train/ | | | actor_loss | -1.8 | | critic_loss | 0.000154 | | ent_coef | 0.000362 | | ent_coef_loss | 0.458 | | learning_rate | 0.0003 | | n_updates | 29747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.622 | | time/ | | | episodes | 992 | | fps | 223 | | time_elapsed | 1067 | | total_timesteps | 238080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.61 | | time/ | | | episodes | 996 | | fps | 206 | | time_elapsed | 1163 | | total_timesteps | 240000 | | train/ | | | actor_loss | -1.77 | | critic_loss | 0.000208 | | ent_coef | 0.000367 | | ent_coef_loss | -0.147 | | learning_rate | 0.0003 | | n_updates | 29987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.61 | | time/ | | | episodes | 1000 | | fps | 206 | | time_elapsed | 1163 | | total_timesteps | 240000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.594 | | time/ | | | episodes | 1004 | | fps | 205 | | time_elapsed | 1179 | | total_timesteps | 241920 | | train/ | | | actor_loss | -1.74 | | critic_loss | 0.000218 | | ent_coef | 0.00037 | | ent_coef_loss | 0.498 | | learning_rate | 0.0003 | | n_updates | 30227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.594 | | time/ | | | episodes | 1008 | | fps | 205 | | time_elapsed | 1179 | | total_timesteps | 241920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.58 | | time/ | | | episodes | 1012 | | fps | 204 | | time_elapsed | 1190 | | total_timesteps | 243840 | | train/ | | | actor_loss | -1.73 | | critic_loss | 0.00015 | | ent_coef | 0.000376 | | ent_coef_loss | 0.94 | | learning_rate | 0.0003 | | n_updates | 30467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.58 | | time/ | | | episodes | 1016 | | fps | 204 | | time_elapsed | 1190 | | total_timesteps | 243840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.585 | | time/ | | | episodes | 1020 | | fps | 204 | | time_elapsed | 1200 | | total_timesteps | 245760 | | train/ | | | actor_loss | -1.7 | | critic_loss | 0.000166 | | ent_coef | 0.000381 | | ent_coef_loss | 0.524 | | learning_rate | 0.0003 | | n_updates | 30707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.585 | | time/ | | | episodes | 1024 | | fps | 204 | | time_elapsed | 1200 | | total_timesteps | 245760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.572 | | time/ | | | episodes | 1028 | | fps | 204 | | time_elapsed | 1210 | | total_timesteps | 247680 | | train/ | | | actor_loss | -1.68 | | critic_loss | 0.000161 | | ent_coef | 0.000385 | | ent_coef_loss | 0.326 | | learning_rate | 0.0003 | | n_updates | 30947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.572 | | time/ | | | episodes | 1032 | | fps | 204 | | time_elapsed | 1210 | | total_timesteps | 247680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.577 | | time/ | | | episodes | 1036 | | fps | 204 | | time_elapsed | 1220 | | total_timesteps | 249600 | | train/ | | | actor_loss | -1.65 | | critic_loss | 0.00014 | | ent_coef | 0.00039 | | ent_coef_loss | 0.769 | | learning_rate | 0.0003 | | n_updates | 31187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.577 | | time/ | | | episodes | 1040 | | fps | 204 | | time_elapsed | 1220 | | total_timesteps | 249600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.61 | | time/ | | | episodes | 1044 | | fps | 203 | | time_elapsed | 1233 | | total_timesteps | 251520 | | train/ | | | actor_loss | -1.62 | | critic_loss | 0.000202 | | ent_coef | 0.000392 | | ent_coef_loss | -0.204 | | learning_rate | 0.0003 | | n_updates | 31427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.61 | | time/ | | | episodes | 1048 | | fps | 203 | | time_elapsed | 1233 | | total_timesteps | 251520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 1052 | | fps | 204 | | time_elapsed | 1242 | | total_timesteps | 253440 | | train/ | | | actor_loss | -1.6 | | critic_loss | 0.00111 | | ent_coef | 0.000387 | | ent_coef_loss | 0.612 | | learning_rate | 0.0003 | | n_updates | 31667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 1056 | | fps | 204 | | time_elapsed | 1242 | | total_timesteps | 253440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.626 | | time/ | | | episodes | 1060 | | fps | 204 | | time_elapsed | 1251 | | total_timesteps | 255360 | | train/ | | | actor_loss | -1.58 | | critic_loss | 0.000352 | | ent_coef | 0.000386 | | ent_coef_loss | -1.06 | | learning_rate | 0.0003 | | n_updates | 31907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.626 | | time/ | | | episodes | 1064 | | fps | 204 | | time_elapsed | 1251 | | total_timesteps | 255360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.615 | | time/ | | | episodes | 1068 | | fps | 204 | | time_elapsed | 1260 | | total_timesteps | 257280 | | train/ | | | actor_loss | -1.57 | | critic_loss | 0.000213 | | ent_coef | 0.000384 | | ent_coef_loss | 0.34 | | learning_rate | 0.0003 | | n_updates | 32147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.615 | | time/ | | | episodes | 1072 | | fps | 204 | | time_elapsed | 1260 | | total_timesteps | 257280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.62 | | time/ | | | episodes | 1076 | | fps | 204 | | time_elapsed | 1269 | | total_timesteps | 259200 | | train/ | | | actor_loss | -1.54 | | critic_loss | 0.000223 | | ent_coef | 0.000385 | | ent_coef_loss | 0.315 | | learning_rate | 0.0003 | | n_updates | 32387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.62 | | time/ | | | episodes | 1080 | | fps | 204 | | time_elapsed | 1269 | | total_timesteps | 259200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.642 | | time/ | | | episodes | 1084 | | fps | 204 | | time_elapsed | 1278 | | total_timesteps | 261120 | | train/ | | | actor_loss | -1.5 | | critic_loss | 0.00015 | | ent_coef | 0.000399 | | ent_coef_loss | 2.59 | | learning_rate | 0.0003 | | n_updates | 32627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.642 | | time/ | | | episodes | 1088 | | fps | 204 | | time_elapsed | 1278 | | total_timesteps | 261120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.647 | | time/ | | | episodes | 1092 | | fps | 204 | | time_elapsed | 1287 | | total_timesteps | 263040 | | train/ | | | actor_loss | -1.49 | | critic_loss | 0.000124 | | ent_coef | 0.00041 | | ent_coef_loss | -0.728 | | learning_rate | 0.0003 | | n_updates | 32867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.647 | | time/ | | | episodes | 1096 | | fps | 204 | | time_elapsed | 1287 | | total_timesteps | 263040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.666 | | time/ | | | episodes | 1100 | | fps | 203 | | time_elapsed | 1301 | | total_timesteps | 264960 | | train/ | | | actor_loss | -1.47 | | critic_loss | 0.00015 | | ent_coef | 0.000407 | | ent_coef_loss | -0.348 | | learning_rate | 0.0003 | | n_updates | 33107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.666 | | time/ | | | episodes | 1104 | | fps | 203 | | time_elapsed | 1301 | | total_timesteps | 264960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1108 | | fps | 203 | | time_elapsed | 1311 | | total_timesteps | 266880 | | train/ | | | actor_loss | -1.45 | | critic_loss | 0.000229 | | ent_coef | 0.000407 | | ent_coef_loss | -0.53 | | learning_rate | 0.0003 | | n_updates | 33347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1112 | | fps | 203 | | time_elapsed | 1311 | | total_timesteps | 266880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.687 | | time/ | | | episodes | 1116 | | fps | 202 | | time_elapsed | 1325 | | total_timesteps | 268800 | | train/ | | | actor_loss | -1.43 | | critic_loss | 0.000137 | | ent_coef | 0.000407 | | ent_coef_loss | 0.109 | | learning_rate | 0.0003 | | n_updates | 33587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.687 | | time/ | | | episodes | 1120 | | fps | 202 | | time_elapsed | 1325 | | total_timesteps | 268800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1124 | | fps | 202 | | time_elapsed | 1337 | | total_timesteps | 270720 | | train/ | | | actor_loss | -1.4 | | critic_loss | 0.000149 | | ent_coef | 0.000407 | | ent_coef_loss | 0.207 | | learning_rate | 0.0003 | | n_updates | 33827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1128 | | fps | 202 | | time_elapsed | 1337 | | total_timesteps | 270720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1132 | | fps | 202 | | time_elapsed | 1347 | | total_timesteps | 272640 | | train/ | | | actor_loss | -1.39 | | critic_loss | 0.000321 | | ent_coef | 0.000405 | | ent_coef_loss | -0.38 | | learning_rate | 0.0003 | | n_updates | 34067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1136 | | fps | 202 | | time_elapsed | 1347 | | total_timesteps | 272640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 1140 | | fps | 202 | | time_elapsed | 1356 | | total_timesteps | 274560 | | train/ | | | actor_loss | -1.35 | | critic_loss | 0.00013 | | ent_coef | 0.000404 | | ent_coef_loss | 0.514 | | learning_rate | 0.0003 | | n_updates | 34307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 1144 | | fps | 202 | | time_elapsed | 1356 | | total_timesteps | 274560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.672 | | time/ | | | episodes | 1148 | | fps | 202 | | time_elapsed | 1366 | | total_timesteps | 276480 | | train/ | | | actor_loss | -1.34 | | critic_loss | 0.000265 | | ent_coef | 0.000412 | | ent_coef_loss | -0.0522 | | learning_rate | 0.0003 | | n_updates | 34547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.672 | | time/ | | | episodes | 1152 | | fps | 202 | | time_elapsed | 1366 | | total_timesteps | 276480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.679 | | time/ | | | episodes | 1156 | | fps | 202 | | time_elapsed | 1375 | | total_timesteps | 278400 | | train/ | | | actor_loss | -1.32 | | critic_loss | 0.000269 | | ent_coef | 0.000418 | | ent_coef_loss | -0.0604 | | learning_rate | 0.0003 | | n_updates | 34787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.679 | | time/ | | | episodes | 1160 | | fps | 202 | | time_elapsed | 1375 | | total_timesteps | 278400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 1164 | | fps | 201 | | time_elapsed | 1394 | | total_timesteps | 280320 | | train/ | | | actor_loss | -1.31 | | critic_loss | 0.000275 | | ent_coef | 0.000426 | | ent_coef_loss | 0.00506 | | learning_rate | 0.0003 | | n_updates | 35027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 1168 | | fps | 201 | | time_elapsed | 1394 | | total_timesteps | 280320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.678 | | time/ | | | episodes | 1172 | | fps | 194 | | time_elapsed | 1449 | | total_timesteps | 282240 | | train/ | | | actor_loss | -1.27 | | critic_loss | 0.000263 | | ent_coef | 0.00044 | | ent_coef_loss | -0.236 | | learning_rate | 0.0003 | | n_updates | 35267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.678 | | time/ | | | episodes | 1176 | | fps | 194 | | time_elapsed | 1449 | | total_timesteps | 282240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.685 | | time/ | | | episodes | 1180 | | fps | 194 | | time_elapsed | 1463 | | total_timesteps | 284160 | | train/ | | | actor_loss | -1.27 | | critic_loss | 0.000127 | | ent_coef | 0.000453 | | ent_coef_loss | -0.545 | | learning_rate | 0.0003 | | n_updates | 35507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.685 | | time/ | | | episodes | 1184 | | fps | 194 | | time_elapsed | 1463 | | total_timesteps | 284160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 1188 | | fps | 194 | | time_elapsed | 1474 | | total_timesteps | 286080 | | train/ | | | actor_loss | -1.25 | | critic_loss | 0.000231 | | ent_coef | 0.000456 | | ent_coef_loss | -0.059 | | learning_rate | 0.0003 | | n_updates | 35747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 1192 | | fps | 194 | | time_elapsed | 1474 | | total_timesteps | 286080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 1196 | | fps | 193 | | time_elapsed | 1487 | | total_timesteps | 288000 | | train/ | | | actor_loss | -1.22 | | critic_loss | 0.00016 | | ent_coef | 0.000456 | | ent_coef_loss | 0.102 | | learning_rate | 0.0003 | | n_updates | 35987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 1200 | | fps | 193 | | time_elapsed | 1487 | | total_timesteps | 288000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 1204 | | fps | 193 | | time_elapsed | 1500 | | total_timesteps | 289920 | | train/ | | | actor_loss | -1.21 | | critic_loss | 0.000127 | | ent_coef | 0.000467 | | ent_coef_loss | -0.788 | | learning_rate | 0.0003 | | n_updates | 36227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 1208 | | fps | 193 | | time_elapsed | 1500 | | total_timesteps | 289920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.595 | | time/ | | | episodes | 1212 | | fps | 190 | | time_elapsed | 1529 | | total_timesteps | 291840 | | train/ | | | actor_loss | -1.19 | | critic_loss | 0.000365 | | ent_coef | 0.000475 | | ent_coef_loss | 0.321 | | learning_rate | 0.0003 | | n_updates | 36467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.595 | | time/ | | | episodes | 1216 | | fps | 190 | | time_elapsed | 1529 | | total_timesteps | 291840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.591 | | time/ | | | episodes | 1220 | | fps | 189 | | time_elapsed | 1547 | | total_timesteps | 293760 | | train/ | | | actor_loss | -1.17 | | critic_loss | 0.000136 | | ent_coef | 0.000482 | | ent_coef_loss | 0.466 | | learning_rate | 0.0003 | | n_updates | 36707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.591 | | time/ | | | episodes | 1224 | | fps | 189 | | time_elapsed | 1548 | | total_timesteps | 293760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.597 | | time/ | | | episodes | 1228 | | fps | 189 | | time_elapsed | 1564 | | total_timesteps | 295680 | | train/ | | | actor_loss | -1.15 | | critic_loss | 0.000151 | | ent_coef | 0.000486 | | ent_coef_loss | 1.27 | | learning_rate | 0.0003 | | n_updates | 36947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.597 | | time/ | | | episodes | 1232 | | fps | 189 | | time_elapsed | 1564 | | total_timesteps | 295680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.595 | | time/ | | | episodes | 1236 | | fps | 188 | | time_elapsed | 1575 | | total_timesteps | 297600 | | train/ | | | actor_loss | -1.12 | | critic_loss | 0.000157 | | ent_coef | 0.000478 | | ent_coef_loss | 0.199 | | learning_rate | 0.0003 | | n_updates | 37187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.595 | | time/ | | | episodes | 1240 | | fps | 188 | | time_elapsed | 1575 | | total_timesteps | 297600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.581 | | time/ | | | episodes | 1244 | | fps | 188 | | time_elapsed | 1592 | | total_timesteps | 299520 | | train/ | | | actor_loss | -1.11 | | critic_loss | 0.00014 | | ent_coef | 0.000479 | | ent_coef_loss | 0.214 | | learning_rate | 0.0003 | | n_updates | 37427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.581 | | time/ | | | episodes | 1248 | | fps | 188 | | time_elapsed | 1592 | | total_timesteps | 299520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.562 | | time/ | | | episodes | 1252 | | fps | 186 | | time_elapsed | 1612 | | total_timesteps | 301440 | | train/ | | | actor_loss | -1.1 | | critic_loss | 0.000137 | | ent_coef | 0.000477 | | ent_coef_loss | -0.449 | | learning_rate | 0.0003 | | n_updates | 37667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.562 | | time/ | | | episodes | 1256 | | fps | 186 | | time_elapsed | 1612 | | total_timesteps | 301440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.537 | | time/ | | | episodes | 1260 | | fps | 186 | | time_elapsed | 1628 | | total_timesteps | 303360 | | train/ | | | actor_loss | -1.09 | | critic_loss | 0.000123 | | ent_coef | 0.000483 | | ent_coef_loss | -0.205 | | learning_rate | 0.0003 | | n_updates | 37907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.537 | | time/ | | | episodes | 1264 | | fps | 186 | | time_elapsed | 1628 | | total_timesteps | 303360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.542 | | time/ | | | episodes | 1268 | | fps | 185 | | time_elapsed | 1646 | | total_timesteps | 305280 | | train/ | | | actor_loss | -1.06 | | critic_loss | 0.000138 | | ent_coef | 0.000488 | | ent_coef_loss | -0.528 | | learning_rate | 0.0003 | | n_updates | 38147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.542 | | time/ | | | episodes | 1272 | | fps | 185 | | time_elapsed | 1646 | | total_timesteps | 305280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.541 | | time/ | | | episodes | 1276 | | fps | 185 | | time_elapsed | 1655 | | total_timesteps | 307200 | | train/ | | | actor_loss | -1.05 | | critic_loss | 0.000173 | | ent_coef | 0.000497 | | ent_coef_loss | -0.317 | | learning_rate | 0.0003 | | n_updates | 38387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.541 | | time/ | | | episodes | 1280 | | fps | 185 | | time_elapsed | 1655 | | total_timesteps | 307200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.547 | | time/ | | | episodes | 1284 | | fps | 184 | | time_elapsed | 1671 | | total_timesteps | 309120 | | train/ | | | actor_loss | -1.03 | | critic_loss | 0.00016 | | ent_coef | 0.0005 | | ent_coef_loss | 0.467 | | learning_rate | 0.0003 | | n_updates | 38627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.547 | | time/ | | | episodes | 1288 | | fps | 184 | | time_elapsed | 1671 | | total_timesteps | 309120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.561 | | time/ | | | episodes | 1292 | | fps | 183 | | time_elapsed | 1696 | | total_timesteps | 311040 | | train/ | | | actor_loss | -1.01 | | critic_loss | 0.000138 | | ent_coef | 0.0005 | | ent_coef_loss | 0.127 | | learning_rate | 0.0003 | | n_updates | 38867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.561 | | time/ | | | episodes | 1296 | | fps | 183 | | time_elapsed | 1696 | | total_timesteps | 311040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.576 | | time/ | | | episodes | 1300 | | fps | 181 | | time_elapsed | 1722 | | total_timesteps | 312960 | | train/ | | | actor_loss | -0.994 | | critic_loss | 0.0002 | | ent_coef | 0.000502 | | ent_coef_loss | -0.31 | | learning_rate | 0.0003 | | n_updates | 39107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.576 | | time/ | | | episodes | 1304 | | fps | 181 | | time_elapsed | 1722 | | total_timesteps | 312960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 1308 | | fps | 180 | | time_elapsed | 1742 | | total_timesteps | 314880 | | train/ | | | actor_loss | -0.977 | | critic_loss | 0.000138 | | ent_coef | 0.00048 | | ent_coef_loss | -0.366 | | learning_rate | 0.0003 | | n_updates | 39347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 1312 | | fps | 180 | | time_elapsed | 1742 | | total_timesteps | 314880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.577 | | time/ | | | episodes | 1316 | | fps | 179 | | time_elapsed | 1762 | | total_timesteps | 316800 | | train/ | | | actor_loss | -0.964 | | critic_loss | 0.000152 | | ent_coef | 0.000458 | | ent_coef_loss | 1.03 | | learning_rate | 0.0003 | | n_updates | 39587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.577 | | time/ | | | episodes | 1320 | | fps | 179 | | time_elapsed | 1762 | | total_timesteps | 316800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.578 | | time/ | | | episodes | 1324 | | fps | 179 | | time_elapsed | 1780 | | total_timesteps | 318720 | | train/ | | | actor_loss | -0.96 | | critic_loss | 0.000164 | | ent_coef | 0.000458 | | ent_coef_loss | 0.355 | | learning_rate | 0.0003 | | n_updates | 39827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.578 | | time/ | | | episodes | 1328 | | fps | 179 | | time_elapsed | 1780 | | total_timesteps | 318720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.566 | | time/ | | | episodes | 1332 | | fps | 178 | | time_elapsed | 1797 | | total_timesteps | 320640 | | train/ | | | actor_loss | -0.928 | | critic_loss | 0.000134 | | ent_coef | 0.000463 | | ent_coef_loss | 0.79 | | learning_rate | 0.0003 | | n_updates | 40067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.566 | | time/ | | | episodes | 1336 | | fps | 178 | | time_elapsed | 1797 | | total_timesteps | 320640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.566 | | time/ | | | episodes | 1340 | | fps | 177 | | time_elapsed | 1816 | | total_timesteps | 322560 | | train/ | | | actor_loss | -0.921 | | critic_loss | 0.000162 | | ent_coef | 0.000475 | | ent_coef_loss | -0.241 | | learning_rate | 0.0003 | | n_updates | 40307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.566 | | time/ | | | episodes | 1344 | | fps | 177 | | time_elapsed | 1816 | | total_timesteps | 322560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.594 | | time/ | | | episodes | 1348 | | fps | 176 | | time_elapsed | 1837 | | total_timesteps | 324480 | | train/ | | | actor_loss | -0.906 | | critic_loss | 0.000146 | | ent_coef | 0.000474 | | ent_coef_loss | -0.523 | | learning_rate | 0.0003 | | n_updates | 40547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.594 | | time/ | | | episodes | 1352 | | fps | 176 | | time_elapsed | 1837 | | total_timesteps | 324480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.597 | | time/ | | | episodes | 1356 | | fps | 175 | | time_elapsed | 1859 | | total_timesteps | 326400 | | train/ | | | actor_loss | -0.883 | | critic_loss | 0.000154 | | ent_coef | 0.000472 | | ent_coef_loss | -0.731 | | learning_rate | 0.0003 | | n_updates | 40787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.597 | | time/ | | | episodes | 1360 | | fps | 175 | | time_elapsed | 1859 | | total_timesteps | 326400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.596 | | time/ | | | episodes | 1364 | | fps | 174 | | time_elapsed | 1878 | | total_timesteps | 328320 | | train/ | | | actor_loss | -0.872 | | critic_loss | 0.000188 | | ent_coef | 0.000471 | | ent_coef_loss | 0.153 | | learning_rate | 0.0003 | | n_updates | 41027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.596 | | time/ | | | episodes | 1368 | | fps | 174 | | time_elapsed | 1878 | | total_timesteps | 328320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.624 | | time/ | | | episodes | 1372 | | fps | 173 | | time_elapsed | 1899 | | total_timesteps | 330240 | | train/ | | | actor_loss | -0.865 | | critic_loss | 0.00013 | | ent_coef | 0.000475 | | ent_coef_loss | 0.33 | | learning_rate | 0.0003 | | n_updates | 41267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.624 | | time/ | | | episodes | 1376 | | fps | 173 | | time_elapsed | 1899 | | total_timesteps | 330240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.62 | | time/ | | | episodes | 1380 | | fps | 172 | | time_elapsed | 1925 | | total_timesteps | 332160 | | train/ | | | actor_loss | -0.845 | | critic_loss | 0.000133 | | ent_coef | 0.000482 | | ent_coef_loss | 0.0183 | | learning_rate | 0.0003 | | n_updates | 41507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.62 | | time/ | | | episodes | 1384 | | fps | 172 | | time_elapsed | 1925 | | total_timesteps | 332160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 1388 | | fps | 171 | | time_elapsed | 1945 | | total_timesteps | 334080 | | train/ | | | actor_loss | -0.831 | | critic_loss | 0.000133 | | ent_coef | 0.000477 | | ent_coef_loss | -0.128 | | learning_rate | 0.0003 | | n_updates | 41747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 1392 | | fps | 171 | | time_elapsed | 1945 | | total_timesteps | 334080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.614 | | time/ | | | episodes | 1396 | | fps | 170 | | time_elapsed | 1968 | | total_timesteps | 336000 | | train/ | | | actor_loss | -0.82 | | critic_loss | 0.00014 | | ent_coef | 0.000469 | | ent_coef_loss | -0.19 | | learning_rate | 0.0003 | | n_updates | 41987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.614 | | time/ | | | episodes | 1400 | | fps | 170 | | time_elapsed | 1968 | | total_timesteps | 336000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 1404 | | fps | 169 | | time_elapsed | 1990 | | total_timesteps | 337920 | | train/ | | | actor_loss | -0.812 | | critic_loss | 0.000195 | | ent_coef | 0.000468 | | ent_coef_loss | -0.109 | | learning_rate | 0.0003 | | n_updates | 42227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 1408 | | fps | 169 | | time_elapsed | 1990 | | total_timesteps | 337920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.617 | | time/ | | | episodes | 1412 | | fps | 168 | | time_elapsed | 2011 | | total_timesteps | 339840 | | train/ | | | actor_loss | -0.796 | | critic_loss | 0.00021 | | ent_coef | 0.000462 | | ent_coef_loss | 0.0553 | | learning_rate | 0.0003 | | n_updates | 42467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.617 | | time/ | | | episodes | 1416 | | fps | 168 | | time_elapsed | 2011 | | total_timesteps | 339840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.631 | | time/ | | | episodes | 1420 | | fps | 168 | | time_elapsed | 2027 | | total_timesteps | 341760 | | train/ | | | actor_loss | -0.776 | | critic_loss | 0.000143 | | ent_coef | 0.000465 | | ent_coef_loss | 0.156 | | learning_rate | 0.0003 | | n_updates | 42707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.631 | | time/ | | | episodes | 1424 | | fps | 168 | | time_elapsed | 2027 | | total_timesteps | 341760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.656 | | time/ | | | episodes | 1428 | | fps | 167 | | time_elapsed | 2057 | | total_timesteps | 343680 | | train/ | | | actor_loss | -0.768 | | critic_loss | 0.000142 | | ent_coef | 0.000462 | | ent_coef_loss | -0.533 | | learning_rate | 0.0003 | | n_updates | 42947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.656 | | time/ | | | episodes | 1432 | | fps | 167 | | time_elapsed | 2057 | | total_timesteps | 343680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 1436 | | fps | 166 | | time_elapsed | 2071 | | total_timesteps | 345600 | | train/ | | | actor_loss | -0.757 | | critic_loss | 0.000154 | | ent_coef | 0.000452 | | ent_coef_loss | -0.66 | | learning_rate | 0.0003 | | n_updates | 43187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 1440 | | fps | 166 | | time_elapsed | 2071 | | total_timesteps | 345600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 1444 | | fps | 166 | | time_elapsed | 2084 | | total_timesteps | 347520 | | train/ | | | actor_loss | -0.739 | | critic_loss | 0.000138 | | ent_coef | 0.000446 | | ent_coef_loss | -0.0561 | | learning_rate | 0.0003 | | n_updates | 43427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 1448 | | fps | 166 | | time_elapsed | 2084 | | total_timesteps | 347520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.642 | | time/ | | | episodes | 1452 | | fps | 165 | | time_elapsed | 2110 | | total_timesteps | 349440 | | train/ | | | actor_loss | -0.733 | | critic_loss | 0.000172 | | ent_coef | 0.000432 | | ent_coef_loss | -0.553 | | learning_rate | 0.0003 | | n_updates | 43667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.642 | | time/ | | | episodes | 1456 | | fps | 165 | | time_elapsed | 2110 | | total_timesteps | 349440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.663 | | time/ | | | episodes | 1460 | | fps | 164 | | time_elapsed | 2133 | | total_timesteps | 351360 | | train/ | | | actor_loss | -0.72 | | critic_loss | 0.000119 | | ent_coef | 0.000407 | | ent_coef_loss | 0.237 | | learning_rate | 0.0003 | | n_updates | 43907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.663 | | time/ | | | episodes | 1464 | | fps | 164 | | time_elapsed | 2133 | | total_timesteps | 351360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.635 | | time/ | | | episodes | 1468 | | fps | 164 | | time_elapsed | 2149 | | total_timesteps | 353280 | | train/ | | | actor_loss | -0.706 | | critic_loss | 0.00018 | | ent_coef | 0.000407 | | ent_coef_loss | -0.0855 | | learning_rate | 0.0003 | | n_updates | 44147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.635 | | time/ | | | episodes | 1472 | | fps | 164 | | time_elapsed | 2149 | | total_timesteps | 353280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.648 | | time/ | | | episodes | 1476 | | fps | 163 | | time_elapsed | 2166 | | total_timesteps | 355200 | | train/ | | | actor_loss | -0.698 | | critic_loss | 0.000147 | | ent_coef | 0.000406 | | ent_coef_loss | -0.164 | | learning_rate | 0.0003 | | n_updates | 44387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.648 | | time/ | | | episodes | 1480 | | fps | 163 | | time_elapsed | 2166 | | total_timesteps | 355200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.634 | | time/ | | | episodes | 1484 | | fps | 163 | | time_elapsed | 2184 | | total_timesteps | 357120 | | train/ | | | actor_loss | -0.684 | | critic_loss | 0.000203 | | ent_coef | 0.000398 | | ent_coef_loss | 0.87 | | learning_rate | 0.0003 | | n_updates | 44627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.634 | | time/ | | | episodes | 1488 | | fps | 163 | | time_elapsed | 2184 | | total_timesteps | 357120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.661 | | time/ | | | episodes | 1492 | | fps | 162 | | time_elapsed | 2202 | | total_timesteps | 359040 | | train/ | | | actor_loss | -0.66 | | critic_loss | 0.000134 | | ent_coef | 0.000391 | | ent_coef_loss | -0.339 | | learning_rate | 0.0003 | | n_updates | 44867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.661 | | time/ | | | episodes | 1496 | | fps | 162 | | time_elapsed | 2202 | | total_timesteps | 359040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 1500 | | fps | 162 | | time_elapsed | 2218 | | total_timesteps | 360960 | | train/ | | | actor_loss | -0.652 | | critic_loss | 0.000131 | | ent_coef | 0.000402 | | ent_coef_loss | 1.06 | | learning_rate | 0.0003 | | n_updates | 45107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 1504 | | fps | 162 | | time_elapsed | 2218 | | total_timesteps | 360960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.663 | | time/ | | | episodes | 1508 | | fps | 162 | | time_elapsed | 2235 | | total_timesteps | 362880 | | train/ | | | actor_loss | -0.649 | | critic_loss | 0.000168 | | ent_coef | 0.000404 | | ent_coef_loss | 0.447 | | learning_rate | 0.0003 | | n_updates | 45347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.663 | | time/ | | | episodes | 1512 | | fps | 162 | | time_elapsed | 2235 | | total_timesteps | 362880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1516 | | fps | 161 | | time_elapsed | 2254 | | total_timesteps | 364800 | | train/ | | | actor_loss | -0.63 | | critic_loss | 0.000174 | | ent_coef | 0.000417 | | ent_coef_loss | -0.336 | | learning_rate | 0.0003 | | n_updates | 45587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1520 | | fps | 161 | | time_elapsed | 2254 | | total_timesteps | 364800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1524 | | fps | 161 | | time_elapsed | 2271 | | total_timesteps | 366720 | | train/ | | | actor_loss | -0.626 | | critic_loss | 0.000151 | | ent_coef | 0.00042 | | ent_coef_loss | 0.178 | | learning_rate | 0.0003 | | n_updates | 45827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 1528 | | fps | 161 | | time_elapsed | 2271 | | total_timesteps | 366720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.68 | | time/ | | | episodes | 1532 | | fps | 160 | | time_elapsed | 2291 | | total_timesteps | 368640 | | train/ | | | actor_loss | -0.604 | | critic_loss | 0.000169 | | ent_coef | 0.000429 | | ent_coef_loss | -0.458 | | learning_rate | 0.0003 | | n_updates | 46067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.68 | | time/ | | | episodes | 1536 | | fps | 160 | | time_elapsed | 2291 | | total_timesteps | 368640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.694 | | time/ | | | episodes | 1540 | | fps | 160 | | time_elapsed | 2315 | | total_timesteps | 370560 | | train/ | | | actor_loss | -0.605 | | critic_loss | 0.000138 | | ent_coef | 0.000434 | | ent_coef_loss | 0.347 | | learning_rate | 0.0003 | | n_updates | 46307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.694 | | time/ | | | episodes | 1544 | | fps | 160 | | time_elapsed | 2315 | | total_timesteps | 370560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.717 | | time/ | | | episodes | 1548 | | fps | 159 | | time_elapsed | 2335 | | total_timesteps | 372480 | | train/ | | | actor_loss | -0.588 | | critic_loss | 0.000138 | | ent_coef | 0.00043 | | ent_coef_loss | 0.0899 | | learning_rate | 0.0003 | | n_updates | 46547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.717 | | time/ | | | episodes | 1552 | | fps | 159 | | time_elapsed | 2335 | | total_timesteps | 372480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.684 | | time/ | | | episodes | 1556 | | fps | 159 | | time_elapsed | 2354 | | total_timesteps | 374400 | | train/ | | | actor_loss | -0.583 | | critic_loss | 0.000157 | | ent_coef | 0.000423 | | ent_coef_loss | -0.111 | | learning_rate | 0.0003 | | n_updates | 46787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.684 | | time/ | | | episodes | 1560 | | fps | 159 | | time_elapsed | 2354 | | total_timesteps | 374400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.68 | | time/ | | | episodes | 1564 | | fps | 158 | | time_elapsed | 2371 | | total_timesteps | 376320 | | train/ | | | actor_loss | -0.566 | | critic_loss | 0.000149 | | ent_coef | 0.00042 | | ent_coef_loss | 0.0573 | | learning_rate | 0.0003 | | n_updates | 47027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.68 | | time/ | | | episodes | 1568 | | fps | 158 | | time_elapsed | 2371 | | total_timesteps | 376320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.677 | | time/ | | | episodes | 1572 | | fps | 157 | | time_elapsed | 2396 | | total_timesteps | 378240 | | train/ | | | actor_loss | -0.557 | | critic_loss | 0.000153 | | ent_coef | 0.000429 | | ent_coef_loss | 0.361 | | learning_rate | 0.0003 | | n_updates | 47267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.677 | | time/ | | | episodes | 1576 | | fps | 157 | | time_elapsed | 2396 | | total_timesteps | 378240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.677 | | time/ | | | episodes | 1580 | | fps | 157 | | time_elapsed | 2414 | | total_timesteps | 380160 | | train/ | | | actor_loss | -0.553 | | critic_loss | 0.000141 | | ent_coef | 0.000431 | | ent_coef_loss | 0.394 | | learning_rate | 0.0003 | | n_updates | 47507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.677 | | time/ | | | episodes | 1584 | | fps | 157 | | time_elapsed | 2414 | | total_timesteps | 380160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.678 | | time/ | | | episodes | 1588 | | fps | 157 | | time_elapsed | 2429 | | total_timesteps | 382080 | | train/ | | | actor_loss | -0.541 | | critic_loss | 0.000176 | | ent_coef | 0.000434 | | ent_coef_loss | 0.367 | | learning_rate | 0.0003 | | n_updates | 47747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.678 | | time/ | | | episodes | 1592 | | fps | 157 | | time_elapsed | 2429 | | total_timesteps | 382080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.651 | | time/ | | | episodes | 1596 | | fps | 156 | | time_elapsed | 2453 | | total_timesteps | 384000 | | train/ | | | actor_loss | -0.538 | | critic_loss | 0.000154 | | ent_coef | 0.000441 | | ent_coef_loss | 0.583 | | learning_rate | 0.0003 | | n_updates | 47987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.651 | | time/ | | | episodes | 1600 | | fps | 156 | | time_elapsed | 2453 | | total_timesteps | 384000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.641 | | time/ | | | episodes | 1604 | | fps | 156 | | time_elapsed | 2468 | | total_timesteps | 385920 | | train/ | | | actor_loss | -0.52 | | critic_loss | 0.000147 | | ent_coef | 0.000444 | | ent_coef_loss | -0.347 | | learning_rate | 0.0003 | | n_updates | 48227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.641 | | time/ | | | episodes | 1608 | | fps | 156 | | time_elapsed | 2468 | | total_timesteps | 385920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.675 | | time/ | | | episodes | 1612 | | fps | 155 | | time_elapsed | 2492 | | total_timesteps | 387840 | | train/ | | | actor_loss | -0.503 | | critic_loss | 0.000144 | | ent_coef | 0.000448 | | ent_coef_loss | -0.476 | | learning_rate | 0.0003 | | n_updates | 48467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.675 | | time/ | | | episodes | 1616 | | fps | 155 | | time_elapsed | 2492 | | total_timesteps | 387840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.662 | | time/ | | | episodes | 1620 | | fps | 155 | | time_elapsed | 2514 | | total_timesteps | 389760 | | train/ | | | actor_loss | -0.501 | | critic_loss | 0.000129 | | ent_coef | 0.000447 | | ent_coef_loss | -0.998 | | learning_rate | 0.0003 | | n_updates | 48707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.662 | | time/ | | | episodes | 1624 | | fps | 155 | | time_elapsed | 2514 | | total_timesteps | 389760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.627 | | time/ | | | episodes | 1628 | | fps | 154 | | time_elapsed | 2532 | | total_timesteps | 391680 | | train/ | | | actor_loss | -0.489 | | critic_loss | 0.000146 | | ent_coef | 0.000451 | | ent_coef_loss | -0.241 | | learning_rate | 0.0003 | | n_updates | 48947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.627 | | time/ | | | episodes | 1632 | | fps | 154 | | time_elapsed | 2532 | | total_timesteps | 391680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.626 | | time/ | | | episodes | 1636 | | fps | 154 | | time_elapsed | 2543 | | total_timesteps | 393600 | | train/ | | | actor_loss | -0.488 | | critic_loss | 0.000142 | | ent_coef | 0.000449 | | ent_coef_loss | 0.081 | | learning_rate | 0.0003 | | n_updates | 49187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.626 | | time/ | | | episodes | 1640 | | fps | 154 | | time_elapsed | 2543 | | total_timesteps | 393600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.631 | | time/ | | | episodes | 1644 | | fps | 154 | | time_elapsed | 2563 | | total_timesteps | 395520 | | train/ | | | actor_loss | -0.474 | | critic_loss | 0.000156 | | ent_coef | 0.00045 | | ent_coef_loss | -0.224 | | learning_rate | 0.0003 | | n_updates | 49427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.631 | | time/ | | | episodes | 1648 | | fps | 154 | | time_elapsed | 2563 | | total_timesteps | 395520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.621 | | time/ | | | episodes | 1652 | | fps | 153 | | time_elapsed | 2582 | | total_timesteps | 397440 | | train/ | | | actor_loss | -0.459 | | critic_loss | 0.000123 | | ent_coef | 0.000454 | | ent_coef_loss | -0.0447 | | learning_rate | 0.0003 | | n_updates | 49667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.621 | | time/ | | | episodes | 1656 | | fps | 153 | | time_elapsed | 2582 | | total_timesteps | 397440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.622 | | time/ | | | episodes | 1660 | | fps | 153 | | time_elapsed | 2597 | | total_timesteps | 399360 | | train/ | | | actor_loss | -0.454 | | critic_loss | 0.000152 | | ent_coef | 0.000458 | | ent_coef_loss | -0.368 | | learning_rate | 0.0003 | | n_updates | 49907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.622 | | time/ | | | episodes | 1664 | | fps | 153 | | time_elapsed | 2597 | | total_timesteps | 399360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.603 | | time/ | | | episodes | 1668 | | fps | 153 | | time_elapsed | 2617 | | total_timesteps | 401280 | | train/ | | | actor_loss | -0.444 | | critic_loss | 0.000165 | | ent_coef | 0.000458 | | ent_coef_loss | 1.32 | | learning_rate | 0.0003 | | n_updates | 50147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.603 | | time/ | | | episodes | 1672 | | fps | 153 | | time_elapsed | 2617 | | total_timesteps | 401280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.607 | | time/ | | | episodes | 1676 | | fps | 152 | | time_elapsed | 2641 | | total_timesteps | 403200 | | train/ | | | actor_loss | -0.437 | | critic_loss | 0.000153 | | ent_coef | 0.000468 | | ent_coef_loss | 0.233 | | learning_rate | 0.0003 | | n_updates | 50387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.607 | | time/ | | | episodes | 1680 | | fps | 152 | | time_elapsed | 2641 | | total_timesteps | 403200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.609 | | time/ | | | episodes | 1684 | | fps | 151 | | time_elapsed | 2669 | | total_timesteps | 405120 | | train/ | | | actor_loss | -0.431 | | critic_loss | 0.000139 | | ent_coef | 0.000469 | | ent_coef_loss | 0.0106 | | learning_rate | 0.0003 | | n_updates | 50627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.609 | | time/ | | | episodes | 1688 | | fps | 151 | | time_elapsed | 2669 | | total_timesteps | 405120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.593 | | time/ | | | episodes | 1692 | | fps | 151 | | time_elapsed | 2686 | | total_timesteps | 407040 | | train/ | | | actor_loss | -0.411 | | critic_loss | 0.000146 | | ent_coef | 0.000472 | | ent_coef_loss | 0.559 | | learning_rate | 0.0003 | | n_updates | 50867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.593 | | time/ | | | episodes | 1696 | | fps | 151 | | time_elapsed | 2686 | | total_timesteps | 407040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.6 | | time/ | | | episodes | 1700 | | fps | 151 | | time_elapsed | 2706 | | total_timesteps | 408960 | | train/ | | | actor_loss | -0.411 | | critic_loss | 0.000128 | | ent_coef | 0.000487 | | ent_coef_loss | -0.827 | | learning_rate | 0.0003 | | n_updates | 51107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.6 | | time/ | | | episodes | 1704 | | fps | 151 | | time_elapsed | 2706 | | total_timesteps | 408960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.597 | | time/ | | | episodes | 1708 | | fps | 150 | | time_elapsed | 2725 | | total_timesteps | 410880 | | train/ | | | actor_loss | -0.393 | | critic_loss | 0.000159 | | ent_coef | 0.000482 | | ent_coef_loss | -0.573 | | learning_rate | 0.0003 | | n_updates | 51347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.597 | | time/ | | | episodes | 1712 | | fps | 150 | | time_elapsed | 2725 | | total_timesteps | 410880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 1716 | | fps | 150 | | time_elapsed | 2742 | | total_timesteps | 412800 | | train/ | | | actor_loss | -0.385 | | critic_loss | 0.000148 | | ent_coef | 0.000471 | | ent_coef_loss | 0.955 | | learning_rate | 0.0003 | | n_updates | 51587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 1720 | | fps | 150 | | time_elapsed | 2742 | | total_timesteps | 412800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.588 | | time/ | | | episodes | 1724 | | fps | 150 | | time_elapsed | 2759 | | total_timesteps | 414720 | | train/ | | | actor_loss | -0.382 | | critic_loss | 0.000138 | | ent_coef | 0.000476 | | ent_coef_loss | 0.035 | | learning_rate | 0.0003 | | n_updates | 51827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.588 | | time/ | | | episodes | 1728 | | fps | 150 | | time_elapsed | 2759 | | total_timesteps | 414720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 1732 | | fps | 149 | | time_elapsed | 2777 | | total_timesteps | 416640 | | train/ | | | actor_loss | -0.377 | | critic_loss | 0.00015 | | ent_coef | 0.000482 | | ent_coef_loss | 0.182 | | learning_rate | 0.0003 | | n_updates | 52067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 1736 | | fps | 149 | | time_elapsed | 2777 | | total_timesteps | 416640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.59 | | time/ | | | episodes | 1740 | | fps | 149 | | time_elapsed | 2793 | | total_timesteps | 418560 | | train/ | | | actor_loss | -0.365 | | critic_loss | 0.000133 | | ent_coef | 0.000496 | | ent_coef_loss | 0.921 | | learning_rate | 0.0003 | | n_updates | 52307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.59 | | time/ | | | episodes | 1744 | | fps | 149 | | time_elapsed | 2793 | | total_timesteps | 418560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.581 | | time/ | | | episodes | 1748 | | fps | 149 | | time_elapsed | 2810 | | total_timesteps | 420480 | | train/ | | | actor_loss | -0.347 | | critic_loss | 0.000186 | | ent_coef | 0.000503 | | ent_coef_loss | 0.841 | | learning_rate | 0.0003 | | n_updates | 52547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.581 | | time/ | | | episodes | 1752 | | fps | 149 | | time_elapsed | 2810 | | total_timesteps | 420480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.579 | | time/ | | | episodes | 1756 | | fps | 149 | | time_elapsed | 2825 | | total_timesteps | 422400 | | train/ | | | actor_loss | -0.341 | | critic_loss | 0.000154 | | ent_coef | 0.000507 | | ent_coef_loss | -0.291 | | learning_rate | 0.0003 | | n_updates | 52787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.579 | | time/ | | | episodes | 1760 | | fps | 149 | | time_elapsed | 2825 | | total_timesteps | 422400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 1764 | | fps | 149 | | time_elapsed | 2838 | | total_timesteps | 424320 | | train/ | | | actor_loss | -0.345 | | critic_loss | 0.000164 | | ent_coef | 0.000501 | | ent_coef_loss | -0.219 | | learning_rate | 0.0003 | | n_updates | 53027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 1768 | | fps | 149 | | time_elapsed | 2838 | | total_timesteps | 424320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.594 | | time/ | | | episodes | 1772 | | fps | 149 | | time_elapsed | 2855 | | total_timesteps | 426240 | | train/ | | | actor_loss | -0.328 | | critic_loss | 0.000162 | | ent_coef | 0.0005 | | ent_coef_loss | 0.381 | | learning_rate | 0.0003 | | n_updates | 53267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.594 | | time/ | | | episodes | 1776 | | fps | 149 | | time_elapsed | 2855 | | total_timesteps | 426240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 1780 | | fps | 149 | | time_elapsed | 2872 | | total_timesteps | 428160 | | train/ | | | actor_loss | -0.322 | | critic_loss | 0.000125 | | ent_coef | 0.000515 | | ent_coef_loss | -0.714 | | learning_rate | 0.0003 | | n_updates | 53507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 1784 | | fps | 149 | | time_elapsed | 2872 | | total_timesteps | 428160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 1788 | | fps | 148 | | time_elapsed | 2888 | | total_timesteps | 430080 | | train/ | | | actor_loss | -0.313 | | critic_loss | 0.000174 | | ent_coef | 0.000515 | | ent_coef_loss | 0.675 | | learning_rate | 0.0003 | | n_updates | 53747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 1792 | | fps | 148 | | time_elapsed | 2888 | | total_timesteps | 430080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.605 | | time/ | | | episodes | 1796 | | fps | 148 | | time_elapsed | 2911 | | total_timesteps | 432000 | | train/ | | | actor_loss | -0.305 | | critic_loss | 0.000153 | | ent_coef | 0.000502 | | ent_coef_loss | -0.334 | | learning_rate | 0.0003 | | n_updates | 53987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.605 | | time/ | | | episodes | 1800 | | fps | 148 | | time_elapsed | 2911 | | total_timesteps | 432000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 1804 | | fps | 148 | | time_elapsed | 2927 | | total_timesteps | 433920 | | train/ | | | actor_loss | -0.287 | | critic_loss | 0.000152 | | ent_coef | 0.000503 | | ent_coef_loss | -0.482 | | learning_rate | 0.0003 | | n_updates | 54227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 1808 | | fps | 148 | | time_elapsed | 2927 | | total_timesteps | 433920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.595 | | time/ | | | episodes | 1812 | | fps | 147 | | time_elapsed | 2947 | | total_timesteps | 435840 | | train/ | | | actor_loss | -0.293 | | critic_loss | 0.000125 | | ent_coef | 0.000509 | | ent_coef_loss | -1.37 | | learning_rate | 0.0003 | | n_updates | 54467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.595 | | time/ | | | episodes | 1816 | | fps | 147 | | time_elapsed | 2948 | | total_timesteps | 435840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.608 | | time/ | | | episodes | 1820 | | fps | 147 | | time_elapsed | 2969 | | total_timesteps | 437760 | | train/ | | | actor_loss | -0.279 | | critic_loss | 0.000164 | | ent_coef | 0.000523 | | ent_coef_loss | -0.229 | | learning_rate | 0.0003 | | n_updates | 54707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.608 | | time/ | | | episodes | 1824 | | fps | 147 | | time_elapsed | 2969 | | total_timesteps | 437760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.588 | | time/ | | | episodes | 1828 | | fps | 147 | | time_elapsed | 2986 | | total_timesteps | 439680 | | train/ | | | actor_loss | -0.286 | | critic_loss | 0.000148 | | ent_coef | 0.000523 | | ent_coef_loss | 0.184 | | learning_rate | 0.0003 | | n_updates | 54947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.588 | | time/ | | | episodes | 1832 | | fps | 147 | | time_elapsed | 2986 | | total_timesteps | 439680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.587 | | time/ | | | episodes | 1836 | | fps | 147 | | time_elapsed | 3001 | | total_timesteps | 441600 | | train/ | | | actor_loss | -0.269 | | critic_loss | 0.000148 | | ent_coef | 0.000529 | | ent_coef_loss | 0.522 | | learning_rate | 0.0003 | | n_updates | 55187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.587 | | time/ | | | episodes | 1840 | | fps | 147 | | time_elapsed | 3001 | | total_timesteps | 441600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.615 | | time/ | | | episodes | 1844 | | fps | 147 | | time_elapsed | 3016 | | total_timesteps | 443520 | | train/ | | | actor_loss | -0.267 | | critic_loss | 0.000169 | | ent_coef | 0.000537 | | ent_coef_loss | -0.0198 | | learning_rate | 0.0003 | | n_updates | 55427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.615 | | time/ | | | episodes | 1848 | | fps | 147 | | time_elapsed | 3016 | | total_timesteps | 443520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.62 | | time/ | | | episodes | 1852 | | fps | 146 | | time_elapsed | 3032 | | total_timesteps | 445440 | | train/ | | | actor_loss | -0.256 | | critic_loss | 0.00016 | | ent_coef | 0.000522 | | ent_coef_loss | 0.126 | | learning_rate | 0.0003 | | n_updates | 55667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.62 | | time/ | | | episodes | 1856 | | fps | 146 | | time_elapsed | 3032 | | total_timesteps | 445440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.608 | | time/ | | | episodes | 1860 | | fps | 146 | | time_elapsed | 3046 | | total_timesteps | 447360 | | train/ | | | actor_loss | -0.244 | | critic_loss | 0.000167 | | ent_coef | 0.000522 | | ent_coef_loss | -0.091 | | learning_rate | 0.0003 | | n_updates | 55907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.608 | | time/ | | | episodes | 1864 | | fps | 146 | | time_elapsed | 3046 | | total_timesteps | 447360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.599 | | time/ | | | episodes | 1868 | | fps | 146 | | time_elapsed | 3058 | | total_timesteps | 449280 | | train/ | | | actor_loss | -0.246 | | critic_loss | 0.000153 | | ent_coef | 0.00052 | | ent_coef_loss | 0.408 | | learning_rate | 0.0003 | | n_updates | 56147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.599 | | time/ | | | episodes | 1872 | | fps | 146 | | time_elapsed | 3058 | | total_timesteps | 449280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.563 | | time/ | | | episodes | 1876 | | fps | 147 | | time_elapsed | 3068 | | total_timesteps | 451200 | | train/ | | | actor_loss | -0.227 | | critic_loss | 0.000234 | | ent_coef | 0.00052 | | ent_coef_loss | 0.478 | | learning_rate | 0.0003 | | n_updates | 56387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.563 | | time/ | | | episodes | 1880 | | fps | 147 | | time_elapsed | 3068 | | total_timesteps | 451200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.553 | | time/ | | | episodes | 1884 | | fps | 146 | | time_elapsed | 3084 | | total_timesteps | 453120 | | train/ | | | actor_loss | -0.235 | | critic_loss | 0.000136 | | ent_coef | 0.000516 | | ent_coef_loss | 0.0237 | | learning_rate | 0.0003 | | n_updates | 56627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.553 | | time/ | | | episodes | 1888 | | fps | 146 | | time_elapsed | 3084 | | total_timesteps | 453120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.561 | | time/ | | | episodes | 1892 | | fps | 146 | | time_elapsed | 3100 | | total_timesteps | 455040 | | train/ | | | actor_loss | -0.224 | | critic_loss | 0.000161 | | ent_coef | 0.000516 | | ent_coef_loss | -0.335 | | learning_rate | 0.0003 | | n_updates | 56867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.561 | | time/ | | | episodes | 1896 | | fps | 146 | | time_elapsed | 3100 | | total_timesteps | 455040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.565 | | time/ | | | episodes | 1900 | | fps | 146 | | time_elapsed | 3128 | | total_timesteps | 456960 | | train/ | | | actor_loss | -0.218 | | critic_loss | 0.00015 | | ent_coef | 0.000508 | | ent_coef_loss | -0.563 | | learning_rate | 0.0003 | | n_updates | 57107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.565 | | time/ | | | episodes | 1904 | | fps | 146 | | time_elapsed | 3128 | | total_timesteps | 456960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.56 | | time/ | | | episodes | 1908 | | fps | 145 | | time_elapsed | 3151 | | total_timesteps | 458880 | | train/ | | | actor_loss | -0.204 | | critic_loss | 0.000181 | | ent_coef | 0.000519 | | ent_coef_loss | 0.29 | | learning_rate | 0.0003 | | n_updates | 57347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.56 | | time/ | | | episodes | 1912 | | fps | 145 | | time_elapsed | 3151 | | total_timesteps | 458880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.56 | | time/ | | | episodes | 1916 | | fps | 145 | | time_elapsed | 3174 | | total_timesteps | 460800 | | train/ | | | actor_loss | -0.203 | | critic_loss | 0.000202 | | ent_coef | 0.000521 | | ent_coef_loss | -0.0305 | | learning_rate | 0.0003 | | n_updates | 57587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.56 | | time/ | | | episodes | 1920 | | fps | 145 | | time_elapsed | 3174 | | total_timesteps | 460800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.565 | | time/ | | | episodes | 1924 | | fps | 144 | | time_elapsed | 3197 | | total_timesteps | 462720 | | train/ | | | actor_loss | -0.203 | | critic_loss | 0.000125 | | ent_coef | 0.000511 | | ent_coef_loss | 0.192 | | learning_rate | 0.0003 | | n_updates | 57827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.565 | | time/ | | | episodes | 1928 | | fps | 144 | | time_elapsed | 3197 | | total_timesteps | 462720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.579 | | time/ | | | episodes | 1932 | | fps | 144 | | time_elapsed | 3222 | | total_timesteps | 464640 | | train/ | | | actor_loss | -0.176 | | critic_loss | 0.00021 | | ent_coef | 0.000512 | | ent_coef_loss | -0.0791 | | learning_rate | 0.0003 | | n_updates | 58067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.579 | | time/ | | | episodes | 1936 | | fps | 144 | | time_elapsed | 3222 | | total_timesteps | 464640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.549 | | time/ | | | episodes | 1940 | | fps | 143 | | time_elapsed | 3244 | | total_timesteps | 466560 | | train/ | | | actor_loss | -0.181 | | critic_loss | 0.000203 | | ent_coef | 0.000524 | | ent_coef_loss | -0.012 | | learning_rate | 0.0003 | | n_updates | 58307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.549 | | time/ | | | episodes | 1944 | | fps | 143 | | time_elapsed | 3244 | | total_timesteps | 466560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.532 | | time/ | | | episodes | 1948 | | fps | 143 | | time_elapsed | 3265 | | total_timesteps | 468480 | | train/ | | | actor_loss | -0.17 | | critic_loss | 0.00014 | | ent_coef | 0.000553 | | ent_coef_loss | -0.506 | | learning_rate | 0.0003 | | n_updates | 58547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.532 | | time/ | | | episodes | 1952 | | fps | 143 | | time_elapsed | 3265 | | total_timesteps | 468480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.534 | | time/ | | | episodes | 1956 | | fps | 143 | | time_elapsed | 3288 | | total_timesteps | 470400 | | train/ | | | actor_loss | -0.17 | | critic_loss | 0.000136 | | ent_coef | 0.000559 | | ent_coef_loss | -0.81 | | learning_rate | 0.0003 | | n_updates | 58787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.534 | | time/ | | | episodes | 1960 | | fps | 143 | | time_elapsed | 3288 | | total_timesteps | 470400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.548 | | time/ | | | episodes | 1964 | | fps | 142 | | time_elapsed | 3310 | | total_timesteps | 472320 | | train/ | | | actor_loss | -0.173 | | critic_loss | 0.000163 | | ent_coef | 0.000559 | | ent_coef_loss | -0.505 | | learning_rate | 0.0003 | | n_updates | 59027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.548 | | time/ | | | episodes | 1968 | | fps | 142 | | time_elapsed | 3310 | | total_timesteps | 472320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.548 | | time/ | | | episodes | 1972 | | fps | 142 | | time_elapsed | 3333 | | total_timesteps | 474240 | | train/ | | | actor_loss | -0.158 | | critic_loss | 0.000185 | | ent_coef | 0.000566 | | ent_coef_loss | 0.85 | | learning_rate | 0.0003 | | n_updates | 59267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.548 | | time/ | | | episodes | 1976 | | fps | 142 | | time_elapsed | 3333 | | total_timesteps | 474240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.563 | | time/ | | | episodes | 1980 | | fps | 141 | | time_elapsed | 3356 | | total_timesteps | 476160 | | train/ | | | actor_loss | -0.155 | | critic_loss | 0.000151 | | ent_coef | 0.000568 | | ent_coef_loss | 0.393 | | learning_rate | 0.0003 | | n_updates | 59507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.563 | | time/ | | | episodes | 1984 | | fps | 141 | | time_elapsed | 3356 | | total_timesteps | 476160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.57 | | time/ | | | episodes | 1988 | | fps | 141 | | time_elapsed | 3379 | | total_timesteps | 478080 | | train/ | | | actor_loss | -0.158 | | critic_loss | 0.000189 | | ent_coef | 0.000577 | | ent_coef_loss | 0.86 | | learning_rate | 0.0003 | | n_updates | 59747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.57 | | time/ | | | episodes | 1992 | | fps | 141 | | time_elapsed | 3380 | | total_timesteps | 478080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.569 | | time/ | | | episodes | 1996 | | fps | 141 | | time_elapsed | 3397 | | total_timesteps | 480000 | | train/ | | | actor_loss | -0.141 | | critic_loss | 0.000177 | | ent_coef | 0.000572 | | ent_coef_loss | 0.326 | | learning_rate | 0.0003 | | n_updates | 59987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.569 | | time/ | | | episodes | 2000 | | fps | 141 | | time_elapsed | 3397 | | total_timesteps | 480000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.562 | | time/ | | | episodes | 2004 | | fps | 141 | | time_elapsed | 3416 | | total_timesteps | 481920 | | train/ | | | actor_loss | -0.134 | | critic_loss | 0.000174 | | ent_coef | 0.000574 | | ent_coef_loss | 0.434 | | learning_rate | 0.0003 | | n_updates | 60227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.562 | | time/ | | | episodes | 2008 | | fps | 141 | | time_elapsed | 3416 | | total_timesteps | 481920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.595 | | time/ | | | episodes | 2012 | | fps | 140 | | time_elapsed | 3437 | | total_timesteps | 483840 | | train/ | | | actor_loss | -0.132 | | critic_loss | 0.000193 | | ent_coef | 0.00058 | | ent_coef_loss | 0.521 | | learning_rate | 0.0003 | | n_updates | 60467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.595 | | time/ | | | episodes | 2016 | | fps | 140 | | time_elapsed | 3437 | | total_timesteps | 483840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.646 | | time/ | | | episodes | 2020 | | fps | 140 | | time_elapsed | 3460 | | total_timesteps | 485760 | | train/ | | | actor_loss | -0.121 | | critic_loss | 0.000144 | | ent_coef | 0.000555 | | ent_coef_loss | -1.03 | | learning_rate | 0.0003 | | n_updates | 60707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.646 | | time/ | | | episodes | 2024 | | fps | 140 | | time_elapsed | 3460 | | total_timesteps | 485760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 2028 | | fps | 139 | | time_elapsed | 3489 | | total_timesteps | 487680 | | train/ | | | actor_loss | -0.116 | | critic_loss | 0.000169 | | ent_coef | 0.000538 | | ent_coef_loss | 2.2 | | learning_rate | 0.0003 | | n_updates | 60947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 2032 | | fps | 139 | | time_elapsed | 3489 | | total_timesteps | 487680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.711 | | time/ | | | episodes | 2036 | | fps | 139 | | time_elapsed | 3507 | | total_timesteps | 489600 | | train/ | | | actor_loss | -0.11 | | critic_loss | 0.000201 | | ent_coef | 0.000516 | | ent_coef_loss | -0.758 | | learning_rate | 0.0003 | | n_updates | 61187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.711 | | time/ | | | episodes | 2040 | | fps | 139 | | time_elapsed | 3507 | | total_timesteps | 489600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.8 | | time/ | | | episodes | 2044 | | fps | 139 | | time_elapsed | 3526 | | total_timesteps | 491520 | | train/ | | | actor_loss | -0.109 | | critic_loss | 0.000175 | | ent_coef | 0.000497 | | ent_coef_loss | -0.924 | | learning_rate | 0.0003 | | n_updates | 61427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.8 | | time/ | | | episodes | 2048 | | fps | 139 | | time_elapsed | 3526 | | total_timesteps | 491520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.851 | | time/ | | | episodes | 2052 | | fps | 139 | | time_elapsed | 3548 | | total_timesteps | 493440 | | train/ | | | actor_loss | -0.109 | | critic_loss | 0.000113 | | ent_coef | 0.000485 | | ent_coef_loss | -0.102 | | learning_rate | 0.0003 | | n_updates | 61667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.851 | | time/ | | | episodes | 2056 | | fps | 139 | | time_elapsed | 3548 | | total_timesteps | 493440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.898 | | time/ | | | episodes | 2060 | | fps | 138 | | time_elapsed | 3572 | | total_timesteps | 495360 | | train/ | | | actor_loss | -0.102 | | critic_loss | 0.000148 | | ent_coef | 0.000485 | | ent_coef_loss | 0.205 | | learning_rate | 0.0003 | | n_updates | 61907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.898 | | time/ | | | episodes | 2064 | | fps | 138 | | time_elapsed | 3572 | | total_timesteps | 495360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.918 | | time/ | | | episodes | 2068 | | fps | 138 | | time_elapsed | 3596 | | total_timesteps | 497280 | | train/ | | | actor_loss | -0.102 | | critic_loss | 0.000136 | | ent_coef | 0.000512 | | ent_coef_loss | 0.161 | | learning_rate | 0.0003 | | n_updates | 62147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.918 | | time/ | | | episodes | 2072 | | fps | 138 | | time_elapsed | 3596 | | total_timesteps | 497280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.926 | | time/ | | | episodes | 2076 | | fps | 138 | | time_elapsed | 3615 | | total_timesteps | 499200 | | train/ | | | actor_loss | -0.0938 | | critic_loss | 0.000121 | | ent_coef | 0.000543 | | ent_coef_loss | 0.904 | | learning_rate | 0.0003 | | n_updates | 62387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.926 | | time/ | | | episodes | 2080 | | fps | 138 | | time_elapsed | 3615 | | total_timesteps | 499200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.918 | | time/ | | | episodes | 2084 | | fps | 137 | | time_elapsed | 3637 | | total_timesteps | 501120 | | train/ | | | actor_loss | -0.0856 | | critic_loss | 0.000129 | | ent_coef | 0.000578 | | ent_coef_loss | 0.256 | | learning_rate | 0.0003 | | n_updates | 62627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.918 | | time/ | | | episodes | 2088 | | fps | 137 | | time_elapsed | 3637 | | total_timesteps | 501120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.932 | | time/ | | | episodes | 2092 | | fps | 137 | | time_elapsed | 3658 | | total_timesteps | 503040 | | train/ | | | actor_loss | -0.0816 | | critic_loss | 0.000121 | | ent_coef | 0.000607 | | ent_coef_loss | 0.862 | | learning_rate | 0.0003 | | n_updates | 62867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.932 | | time/ | | | episodes | 2096 | | fps | 137 | | time_elapsed | 3658 | | total_timesteps | 503040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.975 | | time/ | | | episodes | 2100 | | fps | 137 | | time_elapsed | 3681 | | total_timesteps | 504960 | | train/ | | | actor_loss | -0.0804 | | critic_loss | 0.000141 | | ent_coef | 0.000622 | | ent_coef_loss | 0.183 | | learning_rate | 0.0003 | | n_updates | 63107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.975 | | time/ | | | episodes | 2104 | | fps | 137 | | time_elapsed | 3681 | | total_timesteps | 504960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.971 | | time/ | | | episodes | 2108 | | fps | 137 | | time_elapsed | 3699 | | total_timesteps | 506880 | | train/ | | | actor_loss | -0.0682 | | critic_loss | 0.000138 | | ent_coef | 0.000632 | | ent_coef_loss | 0.706 | | learning_rate | 0.0003 | | n_updates | 63347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.971 | | time/ | | | episodes | 2112 | | fps | 137 | | time_elapsed | 3699 | | total_timesteps | 506880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.959 | | time/ | | | episodes | 2116 | | fps | 137 | | time_elapsed | 3711 | | total_timesteps | 508800 | | train/ | | | actor_loss | -0.0691 | | critic_loss | 0.000231 | | ent_coef | 0.000639 | | ent_coef_loss | 0.877 | | learning_rate | 0.0003 | | n_updates | 63587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.959 | | time/ | | | episodes | 2120 | | fps | 137 | | time_elapsed | 3711 | | total_timesteps | 508800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.873 | | time/ | | | episodes | 2124 | | fps | 137 | | time_elapsed | 3725 | | total_timesteps | 510720 | | train/ | | | actor_loss | -0.0608 | | critic_loss | 0.000186 | | ent_coef | 0.000642 | | ent_coef_loss | -0.568 | | learning_rate | 0.0003 | | n_updates | 63827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.873 | | time/ | | | episodes | 2128 | | fps | 137 | | time_elapsed | 3725 | | total_timesteps | 510720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.834 | | time/ | | | episodes | 2132 | | fps | 137 | | time_elapsed | 3741 | | total_timesteps | 512640 | | train/ | | | actor_loss | -0.0626 | | critic_loss | 0.000131 | | ent_coef | 0.000649 | | ent_coef_loss | 0.304 | | learning_rate | 0.0003 | | n_updates | 64067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.834 | | time/ | | | episodes | 2136 | | fps | 137 | | time_elapsed | 3741 | | total_timesteps | 512640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.843 | | time/ | | | episodes | 2140 | | fps | 136 | | time_elapsed | 3763 | | total_timesteps | 514560 | | train/ | | | actor_loss | -0.0527 | | critic_loss | 0.000196 | | ent_coef | 0.000647 | | ent_coef_loss | -0.222 | | learning_rate | 0.0003 | | n_updates | 64307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.843 | | time/ | | | episodes | 2144 | | fps | 136 | | time_elapsed | 3763 | | total_timesteps | 514560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.785 | | time/ | | | episodes | 2148 | | fps | 136 | | time_elapsed | 3777 | | total_timesteps | 516480 | | train/ | | | actor_loss | -0.0432 | | critic_loss | 0.000163 | | ent_coef | 0.000625 | | ent_coef_loss | 0.217 | | learning_rate | 0.0003 | | n_updates | 64547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.785 | | time/ | | | episodes | 2152 | | fps | 136 | | time_elapsed | 3777 | | total_timesteps | 516480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.789 | | time/ | | | episodes | 2156 | | fps | 136 | | time_elapsed | 3797 | | total_timesteps | 518400 | | train/ | | | actor_loss | -0.0463 | | critic_loss | 0.000331 | | ent_coef | 0.000626 | | ent_coef_loss | 0.358 | | learning_rate | 0.0003 | | n_updates | 64787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.789 | | time/ | | | episodes | 2160 | | fps | 136 | | time_elapsed | 3797 | | total_timesteps | 518400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.745 | | time/ | | | episodes | 2164 | | fps | 136 | | time_elapsed | 3815 | | total_timesteps | 520320 | | train/ | | | actor_loss | -0.0337 | | critic_loss | 0.000151 | | ent_coef | 0.000647 | | ent_coef_loss | 0.661 | | learning_rate | 0.0003 | | n_updates | 65027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.745 | | time/ | | | episodes | 2168 | | fps | 136 | | time_elapsed | 3815 | | total_timesteps | 520320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.733 | | time/ | | | episodes | 2172 | | fps | 136 | | time_elapsed | 3837 | | total_timesteps | 522240 | | train/ | | | actor_loss | -0.0311 | | critic_loss | 0.000129 | | ent_coef | 0.000671 | | ent_coef_loss | 0.196 | | learning_rate | 0.0003 | | n_updates | 65267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.733 | | time/ | | | episodes | 2176 | | fps | 136 | | time_elapsed | 3837 | | total_timesteps | 522240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.733 | | time/ | | | episodes | 2180 | | fps | 135 | | time_elapsed | 3854 | | total_timesteps | 524160 | | train/ | | | actor_loss | -0.0374 | | critic_loss | 0.000166 | | ent_coef | 0.000667 | | ent_coef_loss | 0.199 | | learning_rate | 0.0003 | | n_updates | 65507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.733 | | time/ | | | episodes | 2184 | | fps | 135 | | time_elapsed | 3854 | | total_timesteps | 524160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.743 | | time/ | | | episodes | 2188 | | fps | 135 | | time_elapsed | 3872 | | total_timesteps | 526080 | | train/ | | | actor_loss | -0.0327 | | critic_loss | 0.000164 | | ent_coef | 0.000646 | | ent_coef_loss | 0.128 | | learning_rate | 0.0003 | | n_updates | 65747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.743 | | time/ | | | episodes | 2192 | | fps | 135 | | time_elapsed | 3872 | | total_timesteps | 526080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.722 | | time/ | | | episodes | 2196 | | fps | 135 | | time_elapsed | 3888 | | total_timesteps | 528000 | | train/ | | | actor_loss | -0.0219 | | critic_loss | 0.000146 | | ent_coef | 0.000635 | | ent_coef_loss | -0.172 | | learning_rate | 0.0003 | | n_updates | 65987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.722 | | time/ | | | episodes | 2200 | | fps | 135 | | time_elapsed | 3888 | | total_timesteps | 528000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.681 | | time/ | | | episodes | 2204 | | fps | 135 | | time_elapsed | 3907 | | total_timesteps | 529920 | | train/ | | | actor_loss | -0.0114 | | critic_loss | 0.000146 | | ent_coef | 0.000621 | | ent_coef_loss | -0.234 | | learning_rate | 0.0003 | | n_updates | 66227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.681 | | time/ | | | episodes | 2208 | | fps | 135 | | time_elapsed | 3907 | | total_timesteps | 529920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.672 | | time/ | | | episodes | 2212 | | fps | 135 | | time_elapsed | 3925 | | total_timesteps | 531840 | | train/ | | | actor_loss | -0.0165 | | critic_loss | 0.000155 | | ent_coef | 0.000609 | | ent_coef_loss | -0.0196 | | learning_rate | 0.0003 | | n_updates | 66467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.672 | | time/ | | | episodes | 2216 | | fps | 135 | | time_elapsed | 3925 | | total_timesteps | 531840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.66 | | time/ | | | episodes | 2220 | | fps | 135 | | time_elapsed | 3945 | | total_timesteps | 533760 | | train/ | | | actor_loss | 0.00815 | | critic_loss | 0.00018 | | ent_coef | 0.000611 | | ent_coef_loss | -0.205 | | learning_rate | 0.0003 | | n_updates | 66707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.66 | | time/ | | | episodes | 2224 | | fps | 135 | | time_elapsed | 3945 | | total_timesteps | 533760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.664 | | time/ | | | episodes | 2228 | | fps | 135 | | time_elapsed | 3963 | | total_timesteps | 535680 | | train/ | | | actor_loss | -0.00459 | | critic_loss | 0.000174 | | ent_coef | 0.000603 | | ent_coef_loss | 0.12 | | learning_rate | 0.0003 | | n_updates | 66947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.664 | | time/ | | | episodes | 2232 | | fps | 135 | | time_elapsed | 3963 | | total_timesteps | 535680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 2236 | | fps | 134 | | time_elapsed | 3987 | | total_timesteps | 537600 | | train/ | | | actor_loss | -0.0133 | | critic_loss | 0.000135 | | ent_coef | 0.000603 | | ent_coef_loss | 0.209 | | learning_rate | 0.0003 | | n_updates | 67187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.673 | | time/ | | | episodes | 2240 | | fps | 134 | | time_elapsed | 3987 | | total_timesteps | 537600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.636 | | time/ | | | episodes | 2244 | | fps | 134 | | time_elapsed | 4005 | | total_timesteps | 539520 | | train/ | | | actor_loss | 0.000328 | | critic_loss | 0.000175 | | ent_coef | 0.000601 | | ent_coef_loss | -0.148 | | learning_rate | 0.0003 | | n_updates | 67427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.636 | | time/ | | | episodes | 2248 | | fps | 134 | | time_elapsed | 4005 | | total_timesteps | 539520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.613 | | time/ | | | episodes | 2252 | | fps | 134 | | time_elapsed | 4029 | | total_timesteps | 541440 | | train/ | | | actor_loss | 0.00742 | | critic_loss | 0.000136 | | ent_coef | 0.000591 | | ent_coef_loss | -0.822 | | learning_rate | 0.0003 | | n_updates | 67667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.613 | | time/ | | | episodes | 2256 | | fps | 134 | | time_elapsed | 4029 | | total_timesteps | 541440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.613 | | time/ | | | episodes | 2260 | | fps | 134 | | time_elapsed | 4049 | | total_timesteps | 543360 | | train/ | | | actor_loss | 0.0173 | | critic_loss | 0.000125 | | ent_coef | 0.000551 | | ent_coef_loss | 0.663 | | learning_rate | 0.0003 | | n_updates | 67907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.613 | | time/ | | | episodes | 2264 | | fps | 134 | | time_elapsed | 4049 | | total_timesteps | 543360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.609 | | time/ | | | episodes | 2268 | | fps | 134 | | time_elapsed | 4064 | | total_timesteps | 545280 | | train/ | | | actor_loss | 0.00892 | | critic_loss | 0.000143 | | ent_coef | 0.000546 | | ent_coef_loss | 0.468 | | learning_rate | 0.0003 | | n_updates | 68147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.609 | | time/ | | | episodes | 2272 | | fps | 134 | | time_elapsed | 4064 | | total_timesteps | 545280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 2276 | | fps | 134 | | time_elapsed | 4075 | | total_timesteps | 547200 | | train/ | | | actor_loss | 0.0196 | | critic_loss | 0.000168 | | ent_coef | 0.000553 | | ent_coef_loss | -1.13 | | learning_rate | 0.0003 | | n_updates | 68387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 2280 | | fps | 134 | | time_elapsed | 4075 | | total_timesteps | 547200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.61 | | time/ | | | episodes | 2284 | | fps | 134 | | time_elapsed | 4085 | | total_timesteps | 549120 | | train/ | | | actor_loss | 0.0207 | | critic_loss | 0.000166 | | ent_coef | 0.000545 | | ent_coef_loss | -0.143 | | learning_rate | 0.0003 | | n_updates | 68627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.61 | | time/ | | | episodes | 2288 | | fps | 134 | | time_elapsed | 4085 | | total_timesteps | 549120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.62 | | time/ | | | episodes | 2292 | | fps | 134 | | time_elapsed | 4100 | | total_timesteps | 551040 | | train/ | | | actor_loss | 0.0258 | | critic_loss | 0.000172 | | ent_coef | 0.000539 | | ent_coef_loss | -0.284 | | learning_rate | 0.0003 | | n_updates | 68867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.62 | | time/ | | | episodes | 2296 | | fps | 134 | | time_elapsed | 4100 | | total_timesteps | 551040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.694 | | time/ | | | episodes | 2300 | | fps | 134 | | time_elapsed | 4122 | | total_timesteps | 552960 | | train/ | | | actor_loss | 0.0309 | | critic_loss | 0.00392 | | ent_coef | 0.000528 | | ent_coef_loss | -0.215 | | learning_rate | 0.0003 | | n_updates | 69107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.694 | | time/ | | | episodes | 2304 | | fps | 134 | | time_elapsed | 4122 | | total_timesteps | 552960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.68 | | time/ | | | episodes | 2308 | | fps | 133 | | time_elapsed | 4141 | | total_timesteps | 554880 | | train/ | | | actor_loss | 0.0387 | | critic_loss | 0.000148 | | ent_coef | 0.000495 | | ent_coef_loss | -1.91 | | learning_rate | 0.0003 | | n_updates | 69347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.68 | | time/ | | | episodes | 2312 | | fps | 133 | | time_elapsed | 4141 | | total_timesteps | 554880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.72 | | time/ | | | episodes | 2316 | | fps | 133 | | time_elapsed | 4164 | | total_timesteps | 556800 | | train/ | | | actor_loss | 0.0322 | | critic_loss | 0.000165 | | ent_coef | 0.000488 | | ent_coef_loss | 0.517 | | learning_rate | 0.0003 | | n_updates | 69587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.72 | | time/ | | | episodes | 2320 | | fps | 133 | | time_elapsed | 4164 | | total_timesteps | 556800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.73 | | time/ | | | episodes | 2324 | | fps | 133 | | time_elapsed | 4181 | | total_timesteps | 558720 | | train/ | | | actor_loss | 0.0421 | | critic_loss | 0.000124 | | ent_coef | 0.000522 | | ent_coef_loss | 0.849 | | learning_rate | 0.0003 | | n_updates | 69827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.73 | | time/ | | | episodes | 2328 | | fps | 133 | | time_elapsed | 4181 | | total_timesteps | 558720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.77 | | time/ | | | episodes | 2332 | | fps | 133 | | time_elapsed | 4201 | | total_timesteps | 560640 | | train/ | | | actor_loss | 0.0426 | | critic_loss | 0.000152 | | ent_coef | 0.00055 | | ent_coef_loss | 1.01 | | learning_rate | 0.0003 | | n_updates | 70067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.77 | | time/ | | | episodes | 2336 | | fps | 133 | | time_elapsed | 4201 | | total_timesteps | 560640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.76 | | time/ | | | episodes | 2340 | | fps | 133 | | time_elapsed | 4218 | | total_timesteps | 562560 | | train/ | | | actor_loss | 0.0538 | | critic_loss | 0.000347 | | ent_coef | 0.000597 | | ent_coef_loss | 1.89 | | learning_rate | 0.0003 | | n_updates | 70307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.76 | | time/ | | | episodes | 2344 | | fps | 133 | | time_elapsed | 4218 | | total_timesteps | 562560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.78 | | time/ | | | episodes | 2348 | | fps | 133 | | time_elapsed | 4230 | | total_timesteps | 564480 | | train/ | | | actor_loss | 0.05 | | critic_loss | 0.000168 | | ent_coef | 0.000655 | | ent_coef_loss | 1.13 | | learning_rate | 0.0003 | | n_updates | 70547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.78 | | time/ | | | episodes | 2352 | | fps | 133 | | time_elapsed | 4230 | | total_timesteps | 564480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.8 | | time/ | | | episodes | 2356 | | fps | 133 | | time_elapsed | 4250 | | total_timesteps | 566400 | | train/ | | | actor_loss | 0.0609 | | critic_loss | 0.000245 | | ent_coef | 0.000725 | | ent_coef_loss | 1.39 | | learning_rate | 0.0003 | | n_updates | 70787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.8 | | time/ | | | episodes | 2360 | | fps | 133 | | time_elapsed | 4250 | | total_timesteps | 566400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.79 | | time/ | | | episodes | 2364 | | fps | 132 | | time_elapsed | 4281 | | total_timesteps | 568320 | | train/ | | | actor_loss | 0.104 | | critic_loss | 0.000259 | | ent_coef | 0.000801 | | ent_coef_loss | 2.39 | | learning_rate | 0.0003 | | n_updates | 71027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.79 | | time/ | | | episodes | 2368 | | fps | 132 | | time_elapsed | 4281 | | total_timesteps | 568320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.79 | | time/ | | | episodes | 2372 | | fps | 132 | | time_elapsed | 4306 | | total_timesteps | 570240 | | train/ | | | actor_loss | 0.147 | | critic_loss | 0.0136 | | ent_coef | 0.000877 | | ent_coef_loss | 1.49 | | learning_rate | 0.0003 | | n_updates | 71267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.79 | | time/ | | | episodes | 2376 | | fps | 132 | | time_elapsed | 4306 | | total_timesteps | 570240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.81 | | time/ | | | episodes | 2380 | | fps | 132 | | time_elapsed | 4331 | | total_timesteps | 572160 | | train/ | | | actor_loss | 0.152 | | critic_loss | 0.00725 | | ent_coef | 0.000944 | | ent_coef_loss | 1.8 | | learning_rate | 0.0003 | | n_updates | 71507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.81 | | time/ | | | episodes | 2384 | | fps | 132 | | time_elapsed | 4331 | | total_timesteps | 572160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.8 | | time/ | | | episodes | 2388 | | fps | 131 | | time_elapsed | 4356 | | total_timesteps | 574080 | | train/ | | | actor_loss | 0.0663 | | critic_loss | 0.000394 | | ent_coef | 0.00102 | | ent_coef_loss | 1.48 | | learning_rate | 0.0003 | | n_updates | 71747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.8 | | time/ | | | episodes | 2392 | | fps | 131 | | time_elapsed | 4356 | | total_timesteps | 574080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.75 | | time/ | | | episodes | 2396 | | fps | 131 | | time_elapsed | 4379 | | total_timesteps | 576000 | | train/ | | | actor_loss | 0.14 | | critic_loss | 0.00255 | | ent_coef | 0.00107 | | ent_coef_loss | 0.872 | | learning_rate | 0.0003 | | n_updates | 71987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -4.75 | | time/ | | | episodes | 2400 | | fps | 131 | | time_elapsed | 4379 | | total_timesteps | 576000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -2.71 | | time/ | | | episodes | 2404 | | fps | 131 | | time_elapsed | 4405 | | total_timesteps | 577920 | | train/ | | | actor_loss | 0.0772 | | critic_loss | 0.000192 | | ent_coef | 0.00114 | | ent_coef_loss | 0.608 | | learning_rate | 0.0003 | | n_updates | 72227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -2.71 | | time/ | | | episodes | 2408 | | fps | 131 | | time_elapsed | 4405 | | total_timesteps | 577920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.753 | | time/ | | | episodes | 2412 | | fps | 130 | | time_elapsed | 4430 | | total_timesteps | 579840 | | train/ | | | actor_loss | 0.0817 | | critic_loss | 0.000246 | | ent_coef | 0.0012 | | ent_coef_loss | 0.572 | | learning_rate | 0.0003 | | n_updates | 72467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.753 | | time/ | | | episodes | 2416 | | fps | 130 | | time_elapsed | 4430 | | total_timesteps | 579840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.741 | | time/ | | | episodes | 2420 | | fps | 130 | | time_elapsed | 4453 | | total_timesteps | 581760 | | train/ | | | actor_loss | 0.082 | | critic_loss | 0.000152 | | ent_coef | 0.00128 | | ent_coef_loss | 1.55 | | learning_rate | 0.0003 | | n_updates | 72707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.741 | | time/ | | | episodes | 2424 | | fps | 130 | | time_elapsed | 4453 | | total_timesteps | 581760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.728 | | time/ | | | episodes | 2428 | | fps | 130 | | time_elapsed | 4478 | | total_timesteps | 583680 | | train/ | | | actor_loss | 0.14 | | critic_loss | 0.00169 | | ent_coef | 0.00138 | | ent_coef_loss | 2.46 | | learning_rate | 0.0003 | | n_updates | 72947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.728 | | time/ | | | episodes | 2432 | | fps | 130 | | time_elapsed | 4478 | | total_timesteps | 583680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.692 | | time/ | | | episodes | 2436 | | fps | 130 | | time_elapsed | 4501 | | total_timesteps | 585600 | | train/ | | | actor_loss | 0.113 | | critic_loss | 0.000648 | | ent_coef | 0.0015 | | ent_coef_loss | 1.42 | | learning_rate | 0.0003 | | n_updates | 73187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.692 | | time/ | | | episodes | 2440 | | fps | 130 | | time_elapsed | 4501 | | total_timesteps | 585600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.687 | | time/ | | | episodes | 2444 | | fps | 129 | | time_elapsed | 4526 | | total_timesteps | 587520 | | train/ | | | actor_loss | 0.229 | | critic_loss | 0.0044 | | ent_coef | 0.00159 | | ent_coef_loss | 1.11 | | learning_rate | 0.0003 | | n_updates | 73427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.687 | | time/ | | | episodes | 2448 | | fps | 129 | | time_elapsed | 4526 | | total_timesteps | 587520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.682 | | time/ | | | episodes | 2452 | | fps | 129 | | time_elapsed | 4550 | | total_timesteps | 589440 | | train/ | | | actor_loss | 0.0977 | | critic_loss | 0.000137 | | ent_coef | 0.00161 | | ent_coef_loss | -1.39 | | learning_rate | 0.0003 | | n_updates | 73667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.682 | | time/ | | | episodes | 2456 | | fps | 129 | | time_elapsed | 4550 | | total_timesteps | 589440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 2460 | | fps | 129 | | time_elapsed | 4574 | | total_timesteps | 591360 | | train/ | | | actor_loss | 0.109 | | critic_loss | 0.000233 | | ent_coef | 0.00163 | | ent_coef_loss | 0.837 | | learning_rate | 0.0003 | | n_updates | 73907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 2464 | | fps | 129 | | time_elapsed | 4574 | | total_timesteps | 591360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.676 | | time/ | | | episodes | 2468 | | fps | 128 | | time_elapsed | 4599 | | total_timesteps | 593280 | | train/ | | | actor_loss | 0.131 | | critic_loss | 0.00168 | | ent_coef | 0.00168 | | ent_coef_loss | 0.272 | | learning_rate | 0.0003 | | n_updates | 74147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.676 | | time/ | | | episodes | 2472 | | fps | 128 | | time_elapsed | 4599 | | total_timesteps | 593280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.662 | | time/ | | | episodes | 2476 | | fps | 128 | | time_elapsed | 4614 | | total_timesteps | 595200 | | train/ | | | actor_loss | 0.124 | | critic_loss | 0.000224 | | ent_coef | 0.00171 | | ent_coef_loss | -0.603 | | learning_rate | 0.0003 | | n_updates | 74387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.662 | | time/ | | | episodes | 2480 | | fps | 128 | | time_elapsed | 4614 | | total_timesteps | 595200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 2484 | | fps | 128 | | time_elapsed | 4629 | | total_timesteps | 597120 | | train/ | | | actor_loss | 0.212 | | critic_loss | 0.00126 | | ent_coef | 0.00169 | | ent_coef_loss | 0.113 | | learning_rate | 0.0003 | | n_updates | 74627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 2488 | | fps | 128 | | time_elapsed | 4629 | | total_timesteps | 597120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.686 | | time/ | | | episodes | 2492 | | fps | 128 | | time_elapsed | 4645 | | total_timesteps | 599040 | | train/ | | | actor_loss | 0.279 | | critic_loss | 0.0017 | | ent_coef | 0.00165 | | ent_coef_loss | 0.456 | | learning_rate | 0.0003 | | n_updates | 74867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.686 | | time/ | | | episodes | 2496 | | fps | 128 | | time_elapsed | 4645 | | total_timesteps | 599040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.66 | | time/ | | | episodes | 2500 | | fps | 128 | | time_elapsed | 4659 | | total_timesteps | 600960 | | train/ | | | actor_loss | 0.211 | | critic_loss | 0.00735 | | ent_coef | 0.00157 | | ent_coef_loss | 0.577 | | learning_rate | 0.0003 | | n_updates | 75107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.66 | | time/ | | | episodes | 2504 | | fps | 128 | | time_elapsed | 4659 | | total_timesteps | 600960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.655 | | time/ | | | episodes | 2508 | | fps | 128 | | time_elapsed | 4673 | | total_timesteps | 602880 | | train/ | | | actor_loss | 0.216 | | critic_loss | 0.00286 | | ent_coef | 0.00159 | | ent_coef_loss | 3.48 | | learning_rate | 0.0003 | | n_updates | 75347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.655 | | time/ | | | episodes | 2512 | | fps | 128 | | time_elapsed | 4673 | | total_timesteps | 602880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.672 | | time/ | | | episodes | 2516 | | fps | 128 | | time_elapsed | 4697 | | total_timesteps | 604800 | | train/ | | | actor_loss | 0.122 | | critic_loss | 0.000141 | | ent_coef | 0.00167 | | ent_coef_loss | -0.339 | | learning_rate | 0.0003 | | n_updates | 75587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.672 | | time/ | | | episodes | 2520 | | fps | 128 | | time_elapsed | 4697 | | total_timesteps | 604800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.684 | | time/ | | | episodes | 2524 | | fps | 128 | | time_elapsed | 4733 | | total_timesteps | 606720 | | train/ | | | actor_loss | 0.497 | | critic_loss | 0.009 | | ent_coef | 0.00166 | | ent_coef_loss | 0.868 | | learning_rate | 0.0003 | | n_updates | 75827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.684 | | time/ | | | episodes | 2528 | | fps | 128 | | time_elapsed | 4733 | | total_timesteps | 606720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 2532 | | fps | 127 | | time_elapsed | 4757 | | total_timesteps | 608640 | | train/ | | | actor_loss | 0.141 | | critic_loss | 0.000193 | | ent_coef | 0.00162 | | ent_coef_loss | -0.902 | | learning_rate | 0.0003 | | n_updates | 76067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 2536 | | fps | 127 | | time_elapsed | 4757 | | total_timesteps | 608640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.683 | | time/ | | | episodes | 2540 | | fps | 127 | | time_elapsed | 4776 | | total_timesteps | 610560 | | train/ | | | actor_loss | 0.194 | | critic_loss | 0.00354 | | ent_coef | 0.00159 | | ent_coef_loss | 0.6 | | learning_rate | 0.0003 | | n_updates | 76307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.683 | | time/ | | | episodes | 2544 | | fps | 127 | | time_elapsed | 4776 | | total_timesteps | 610560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 2548 | | fps | 127 | | time_elapsed | 4787 | | total_timesteps | 612480 | | train/ | | | actor_loss | 0.227 | | critic_loss | 0.00134 | | ent_coef | 0.00159 | | ent_coef_loss | 2.27 | | learning_rate | 0.0003 | | n_updates | 76547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.688 | | time/ | | | episodes | 2552 | | fps | 127 | | time_elapsed | 4787 | | total_timesteps | 612480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.681 | | time/ | | | episodes | 2556 | | fps | 128 | | time_elapsed | 4799 | | total_timesteps | 614400 | | train/ | | | actor_loss | 0.155 | | critic_loss | 0.000153 | | ent_coef | 0.00158 | | ent_coef_loss | -0.485 | | learning_rate | 0.0003 | | n_updates | 76787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.681 | | time/ | | | episodes | 2560 | | fps | 128 | | time_elapsed | 4799 | | total_timesteps | 614400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.678 | | time/ | | | episodes | 2564 | | fps | 127 | | time_elapsed | 4820 | | total_timesteps | 616320 | | train/ | | | actor_loss | 0.189 | | critic_loss | 0.00199 | | ent_coef | 0.00152 | | ent_coef_loss | -1 | | learning_rate | 0.0003 | | n_updates | 77027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.678 | | time/ | | | episodes | 2568 | | fps | 127 | | time_elapsed | 4820 | | total_timesteps | 616320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.679 | | time/ | | | episodes | 2572 | | fps | 127 | | time_elapsed | 4833 | | total_timesteps | 618240 | | train/ | | | actor_loss | 0.157 | | critic_loss | 0.000565 | | ent_coef | 0.00143 | | ent_coef_loss | -2 | | learning_rate | 0.0003 | | n_updates | 77267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.679 | | time/ | | | episodes | 2576 | | fps | 127 | | time_elapsed | 4833 | | total_timesteps | 618240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.665 | | time/ | | | episodes | 2580 | | fps | 128 | | time_elapsed | 4844 | | total_timesteps | 620160 | | train/ | | | actor_loss | 0.156 | | critic_loss | 0.000151 | | ent_coef | 0.00137 | | ent_coef_loss | -1.37 | | learning_rate | 0.0003 | | n_updates | 77507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.665 | | time/ | | | episodes | 2584 | | fps | 128 | | time_elapsed | 4844 | | total_timesteps | 620160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.65 | | time/ | | | episodes | 2588 | | fps | 128 | | time_elapsed | 4853 | | total_timesteps | 622080 | | train/ | | | actor_loss | 0.193 | | critic_loss | 0.000518 | | ent_coef | 0.0013 | | ent_coef_loss | -0.071 | | learning_rate | 0.0003 | | n_updates | 77747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.65 | | time/ | | | episodes | 2592 | | fps | 128 | | time_elapsed | 4853 | | total_timesteps | 622080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.643 | | time/ | | | episodes | 2596 | | fps | 128 | | time_elapsed | 4863 | | total_timesteps | 624000 | | train/ | | | actor_loss | 0.161 | | critic_loss | 0.000127 | | ent_coef | 0.0013 | | ent_coef_loss | -0.588 | | learning_rate | 0.0003 | | n_updates | 77987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.643 | | time/ | | | episodes | 2600 | | fps | 128 | | time_elapsed | 4863 | | total_timesteps | 624000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.639 | | time/ | | | episodes | 2604 | | fps | 128 | | time_elapsed | 4873 | | total_timesteps | 625920 | | train/ | | | actor_loss | 0.185 | | critic_loss | 0.000252 | | ent_coef | 0.00123 | | ent_coef_loss | -0.801 | | learning_rate | 0.0003 | | n_updates | 78227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.639 | | time/ | | | episodes | 2608 | | fps | 128 | | time_elapsed | 4873 | | total_timesteps | 625920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 2612 | | fps | 128 | | time_elapsed | 4882 | | total_timesteps | 627840 | | train/ | | | actor_loss | 0.178 | | critic_loss | 0.000195 | | ent_coef | 0.00124 | | ent_coef_loss | -0.817 | | learning_rate | 0.0003 | | n_updates | 78467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.67 | | time/ | | | episodes | 2616 | | fps | 128 | | time_elapsed | 4882 | | total_timesteps | 627840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.657 | | time/ | | | episodes | 2620 | | fps | 128 | | time_elapsed | 4892 | | total_timesteps | 629760 | | train/ | | | actor_loss | 0.299 | | critic_loss | 0.00409 | | ent_coef | 0.00121 | | ent_coef_loss | -0.767 | | learning_rate | 0.0003 | | n_updates | 78707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.657 | | time/ | | | episodes | 2624 | | fps | 128 | | time_elapsed | 4892 | | total_timesteps | 629760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.658 | | time/ | | | episodes | 2628 | | fps | 128 | | time_elapsed | 4902 | | total_timesteps | 631680 | | train/ | | | actor_loss | 0.178 | | critic_loss | 0.000184 | | ent_coef | 0.00116 | | ent_coef_loss | -2.12 | | learning_rate | 0.0003 | | n_updates | 78947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.658 | | time/ | | | episodes | 2632 | | fps | 128 | | time_elapsed | 4902 | | total_timesteps | 631680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.645 | | time/ | | | episodes | 2636 | | fps | 128 | | time_elapsed | 4911 | | total_timesteps | 633600 | | train/ | | | actor_loss | 0.175 | | critic_loss | 0.000115 | | ent_coef | 0.00112 | | ent_coef_loss | -0.991 | | learning_rate | 0.0003 | | n_updates | 79187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.645 | | time/ | | | episodes | 2640 | | fps | 128 | | time_elapsed | 4911 | | total_timesteps | 633600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.628 | | time/ | | | episodes | 2644 | | fps | 129 | | time_elapsed | 4921 | | total_timesteps | 635520 | | train/ | | | actor_loss | 0.18 | | critic_loss | 0.000144 | | ent_coef | 0.00112 | | ent_coef_loss | -0.0348 | | learning_rate | 0.0003 | | n_updates | 79427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.628 | | time/ | | | episodes | 2648 | | fps | 129 | | time_elapsed | 4921 | | total_timesteps | 635520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.628 | | time/ | | | episodes | 2652 | | fps | 128 | | time_elapsed | 4967 | | total_timesteps | 637440 | | train/ | | | actor_loss | 0.191 | | critic_loss | 0.00167 | | ent_coef | 0.00109 | | ent_coef_loss | -0.731 | | learning_rate | 0.0003 | | n_updates | 79667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.628 | | time/ | | | episodes | 2656 | | fps | 128 | | time_elapsed | 4967 | | total_timesteps | 637440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.644 | | time/ | | | episodes | 2660 | | fps | 128 | | time_elapsed | 4980 | | total_timesteps | 639360 | | train/ | | | actor_loss | 0.229 | | critic_loss | 0.00158 | | ent_coef | 0.00104 | | ent_coef_loss | -1.11 | | learning_rate | 0.0003 | | n_updates | 79907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.644 | | time/ | | | episodes | 2664 | | fps | 128 | | time_elapsed | 4980 | | total_timesteps | 639360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.642 | | time/ | | | episodes | 2668 | | fps | 128 | | time_elapsed | 4994 | | total_timesteps | 641280 | | train/ | | | actor_loss | 0.312 | | critic_loss | 0.00113 | | ent_coef | 0.00106 | | ent_coef_loss | 0.999 | | learning_rate | 0.0003 | | n_updates | 80147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.642 | | time/ | | | episodes | 2672 | | fps | 128 | | time_elapsed | 4994 | | total_timesteps | 641280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.628 | | time/ | | | episodes | 2676 | | fps | 128 | | time_elapsed | 5007 | | total_timesteps | 643200 | | train/ | | | actor_loss | 0.234 | | critic_loss | 0.00935 | | ent_coef | 0.00106 | | ent_coef_loss | 0.531 | | learning_rate | 0.0003 | | n_updates | 80387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.628 | | time/ | | | episodes | 2680 | | fps | 128 | | time_elapsed | 5007 | | total_timesteps | 643200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.641 | | time/ | | | episodes | 2684 | | fps | 128 | | time_elapsed | 5032 | | total_timesteps | 645120 | | train/ | | | actor_loss | 0.232 | | critic_loss | 0.00371 | | ent_coef | 0.00104 | | ent_coef_loss | 1.19 | | learning_rate | 0.0003 | | n_updates | 80627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.641 | | time/ | | | episodes | 2688 | | fps | 128 | | time_elapsed | 5032 | | total_timesteps | 645120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 2692 | | fps | 127 | | time_elapsed | 5065 | | total_timesteps | 647040 | | train/ | | | actor_loss | 0.201 | | critic_loss | 0.000128 | | ent_coef | 0.00103 | | ent_coef_loss | -1.18 | | learning_rate | 0.0003 | | n_updates | 80867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 2696 | | fps | 127 | | time_elapsed | 5065 | | total_timesteps | 647040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.641 | | time/ | | | episodes | 2700 | | fps | 127 | | time_elapsed | 5079 | | total_timesteps | 648960 | | train/ | | | actor_loss | 0.201 | | critic_loss | 0.000139 | | ent_coef | 0.00103 | | ent_coef_loss | -0.33 | | learning_rate | 0.0003 | | n_updates | 81107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.641 | | time/ | | | episodes | 2704 | | fps | 127 | | time_elapsed | 5079 | | total_timesteps | 648960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.627 | | time/ | | | episodes | 2708 | | fps | 127 | | time_elapsed | 5093 | | total_timesteps | 650880 | | train/ | | | actor_loss | 0.3 | | critic_loss | 0.000214 | | ent_coef | 0.00103 | | ent_coef_loss | 0.0361 | | learning_rate | 0.0003 | | n_updates | 81347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.627 | | time/ | | | episodes | 2712 | | fps | 127 | | time_elapsed | 5093 | | total_timesteps | 650880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.598 | | time/ | | | episodes | 2716 | | fps | 127 | | time_elapsed | 5108 | | total_timesteps | 652800 | | train/ | | | actor_loss | 0.21 | | critic_loss | 0.00328 | | ent_coef | 0.00102 | | ent_coef_loss | -0.714 | | learning_rate | 0.0003 | | n_updates | 81587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.598 | | time/ | | | episodes | 2720 | | fps | 127 | | time_elapsed | 5108 | | total_timesteps | 652800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.587 | | time/ | | | episodes | 2724 | | fps | 127 | | time_elapsed | 5130 | | total_timesteps | 654720 | | train/ | | | actor_loss | 0.231 | | critic_loss | 0.00179 | | ent_coef | 0.00102 | | ent_coef_loss | 2.39 | | learning_rate | 0.0003 | | n_updates | 81827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.587 | | time/ | | | episodes | 2728 | | fps | 127 | | time_elapsed | 5130 | | total_timesteps | 654720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.599 | | time/ | | | episodes | 2732 | | fps | 127 | | time_elapsed | 5149 | | total_timesteps | 656640 | | train/ | | | actor_loss | 0.236 | | critic_loss | 0.00133 | | ent_coef | 0.00104 | | ent_coef_loss | 0.0109 | | learning_rate | 0.0003 | | n_updates | 82067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.599 | | time/ | | | episodes | 2736 | | fps | 127 | | time_elapsed | 5149 | | total_timesteps | 656640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.607 | | time/ | | | episodes | 2740 | | fps | 127 | | time_elapsed | 5169 | | total_timesteps | 658560 | | train/ | | | actor_loss | 0.217 | | critic_loss | 0.00074 | | ent_coef | 0.000992 | | ent_coef_loss | -1.58 | | learning_rate | 0.0003 | | n_updates | 82307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.607 | | time/ | | | episodes | 2744 | | fps | 127 | | time_elapsed | 5169 | | total_timesteps | 658560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.633 | | time/ | | | episodes | 2748 | | fps | 127 | | time_elapsed | 5187 | | total_timesteps | 660480 | | train/ | | | actor_loss | 0.284 | | critic_loss | 0.000285 | | ent_coef | 0.000928 | | ent_coef_loss | 0.592 | | learning_rate | 0.0003 | | n_updates | 82547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.633 | | time/ | | | episodes | 2752 | | fps | 127 | | time_elapsed | 5187 | | total_timesteps | 660480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.626 | | time/ | | | episodes | 2756 | | fps | 127 | | time_elapsed | 5200 | | total_timesteps | 662400 | | train/ | | | actor_loss | 0.258 | | critic_loss | 0.00257 | | ent_coef | 0.000908 | | ent_coef_loss | -0.152 | | learning_rate | 0.0003 | | n_updates | 82787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.626 | | time/ | | | episodes | 2760 | | fps | 127 | | time_elapsed | 5200 | | total_timesteps | 662400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.635 | | time/ | | | episodes | 2764 | | fps | 127 | | time_elapsed | 5214 | | total_timesteps | 664320 | | train/ | | | actor_loss | 0.232 | | critic_loss | 0.000181 | | ent_coef | 0.000902 | | ent_coef_loss | -3.04 | | learning_rate | 0.0003 | | n_updates | 83027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.635 | | time/ | | | episodes | 2768 | | fps | 127 | | time_elapsed | 5214 | | total_timesteps | 664320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.65 | | time/ | | | episodes | 2772 | | fps | 127 | | time_elapsed | 5230 | | total_timesteps | 666240 | | train/ | | | actor_loss | 0.269 | | critic_loss | 0.000213 | | ent_coef | 0.000858 | | ent_coef_loss | -0.235 | | learning_rate | 0.0003 | | n_updates | 83267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.65 | | time/ | | | episodes | 2776 | | fps | 127 | | time_elapsed | 5230 | | total_timesteps | 666240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.645 | | time/ | | | episodes | 2780 | | fps | 127 | | time_elapsed | 5256 | | total_timesteps | 668160 | | train/ | | | actor_loss | 0.233 | | critic_loss | 0.00418 | | ent_coef | 0.000861 | | ent_coef_loss | 1.25 | | learning_rate | 0.0003 | | n_updates | 83507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.645 | | time/ | | | episodes | 2784 | | fps | 127 | | time_elapsed | 5256 | | total_timesteps | 668160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 2788 | | fps | 127 | | time_elapsed | 5274 | | total_timesteps | 670080 | | train/ | | | actor_loss | 0.227 | | critic_loss | 0.000194 | | ent_coef | 0.00092 | | ent_coef_loss | -0.5 | | learning_rate | 0.0003 | | n_updates | 83747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 2792 | | fps | 127 | | time_elapsed | 5274 | | total_timesteps | 670080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.655 | | time/ | | | episodes | 2796 | | fps | 126 | | time_elapsed | 5293 | | total_timesteps | 672000 | | train/ | | | actor_loss | 0.226 | | critic_loss | 0.000148 | | ent_coef | 0.000961 | | ent_coef_loss | -0.861 | | learning_rate | 0.0003 | | n_updates | 83987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.655 | | time/ | | | episodes | 2800 | | fps | 126 | | time_elapsed | 5293 | | total_timesteps | 672000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 2804 | | fps | 126 | | time_elapsed | 5313 | | total_timesteps | 673920 | | train/ | | | actor_loss | 0.402 | | critic_loss | 0.00269 | | ent_coef | 0.000972 | | ent_coef_loss | 0.804 | | learning_rate | 0.0003 | | n_updates | 84227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 2808 | | fps | 126 | | time_elapsed | 5313 | | total_timesteps | 673920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.655 | | time/ | | | episodes | 2812 | | fps | 126 | | time_elapsed | 5332 | | total_timesteps | 675840 | | train/ | | | actor_loss | 0.241 | | critic_loss | 0.000145 | | ent_coef | 0.00099 | | ent_coef_loss | -0.858 | | learning_rate | 0.0003 | | n_updates | 84467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.655 | | time/ | | | episodes | 2816 | | fps | 126 | | time_elapsed | 5332 | | total_timesteps | 675840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.653 | | time/ | | | episodes | 2820 | | fps | 126 | | time_elapsed | 5351 | | total_timesteps | 677760 | | train/ | | | actor_loss | 0.382 | | critic_loss | 0.00418 | | ent_coef | 0.000973 | | ent_coef_loss | 0.396 | | learning_rate | 0.0003 | | n_updates | 84707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.653 | | time/ | | | episodes | 2824 | | fps | 126 | | time_elapsed | 5351 | | total_timesteps | 677760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.69 | | time/ | | | episodes | 2828 | | fps | 126 | | time_elapsed | 5370 | | total_timesteps | 679680 | | train/ | | | actor_loss | 0.26 | | critic_loss | 0.000276 | | ent_coef | 0.00094 | | ent_coef_loss | -2.17 | | learning_rate | 0.0003 | | n_updates | 84947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.69 | | time/ | | | episodes | 2832 | | fps | 126 | | time_elapsed | 5370 | | total_timesteps | 679680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.687 | | time/ | | | episodes | 2836 | | fps | 126 | | time_elapsed | 5389 | | total_timesteps | 681600 | | train/ | | | actor_loss | 0.406 | | critic_loss | 0.0112 | | ent_coef | 0.000911 | | ent_coef_loss | 1.72 | | learning_rate | 0.0003 | | n_updates | 85187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.687 | | time/ | | | episodes | 2840 | | fps | 126 | | time_elapsed | 5389 | | total_timesteps | 681600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.687 | | time/ | | | episodes | 2844 | | fps | 126 | | time_elapsed | 5408 | | total_timesteps | 683520 | | train/ | | | actor_loss | 0.264 | | critic_loss | 0.00738 | | ent_coef | 0.000924 | | ent_coef_loss | -1.25 | | learning_rate | 0.0003 | | n_updates | 85427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.687 | | time/ | | | episodes | 2848 | | fps | 126 | | time_elapsed | 5408 | | total_timesteps | 683520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.669 | | time/ | | | episodes | 2852 | | fps | 126 | | time_elapsed | 5426 | | total_timesteps | 685440 | | train/ | | | actor_loss | 0.252 | | critic_loss | 0.000202 | | ent_coef | 0.000945 | | ent_coef_loss | 0.101 | | learning_rate | 0.0003 | | n_updates | 85667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.669 | | time/ | | | episodes | 2856 | | fps | 126 | | time_elapsed | 5426 | | total_timesteps | 685440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.65 | | time/ | | | episodes | 2860 | | fps | 126 | | time_elapsed | 5445 | | total_timesteps | 687360 | | train/ | | | actor_loss | 0.254 | | critic_loss | 0.000191 | | ent_coef | 0.00095 | | ent_coef_loss | -0.295 | | learning_rate | 0.0003 | | n_updates | 85907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.65 | | time/ | | | episodes | 2864 | | fps | 126 | | time_elapsed | 5445 | | total_timesteps | 687360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.639 | | time/ | | | episodes | 2868 | | fps | 126 | | time_elapsed | 5465 | | total_timesteps | 689280 | | train/ | | | actor_loss | 0.248 | | critic_loss | 0.000181 | | ent_coef | 0.000974 | | ent_coef_loss | -0.765 | | learning_rate | 0.0003 | | n_updates | 86147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.639 | | time/ | | | episodes | 2872 | | fps | 126 | | time_elapsed | 5465 | | total_timesteps | 689280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.647 | | time/ | | | episodes | 2876 | | fps | 126 | | time_elapsed | 5484 | | total_timesteps | 691200 | | train/ | | | actor_loss | 0.275 | | critic_loss | 0.00184 | | ent_coef | 0.000966 | | ent_coef_loss | -0.0676 | | learning_rate | 0.0003 | | n_updates | 86387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.647 | | time/ | | | episodes | 2880 | | fps | 126 | | time_elapsed | 5484 | | total_timesteps | 691200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 2884 | | fps | 125 | | time_elapsed | 5503 | | total_timesteps | 693120 | | train/ | | | actor_loss | 0.265 | | critic_loss | 0.000844 | | ent_coef | 0.000938 | | ent_coef_loss | 0.164 | | learning_rate | 0.0003 | | n_updates | 86627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 2888 | | fps | 125 | | time_elapsed | 5503 | | total_timesteps | 693120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.638 | | time/ | | | episodes | 2892 | | fps | 125 | | time_elapsed | 5522 | | total_timesteps | 695040 | | train/ | | | actor_loss | 0.343 | | critic_loss | 0.0137 | | ent_coef | 0.000936 | | ent_coef_loss | 1.08 | | learning_rate | 0.0003 | | n_updates | 86867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.638 | | time/ | | | episodes | 2896 | | fps | 125 | | time_elapsed | 5522 | | total_timesteps | 695040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.63 | | time/ | | | episodes | 2900 | | fps | 125 | | time_elapsed | 5549 | | total_timesteps | 696960 | | train/ | | | actor_loss | 0.263 | | critic_loss | 0.000298 | | ent_coef | 0.000947 | | ent_coef_loss | 0.242 | | learning_rate | 0.0003 | | n_updates | 87107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.63 | | time/ | | | episodes | 2904 | | fps | 125 | | time_elapsed | 5549 | | total_timesteps | 696960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.616 | | time/ | | | episodes | 2908 | | fps | 125 | | time_elapsed | 5572 | | total_timesteps | 698880 | | train/ | | | actor_loss | 0.292 | | critic_loss | 0.00486 | | ent_coef | 0.00101 | | ent_coef_loss | 0.812 | | learning_rate | 0.0003 | | n_updates | 87347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.616 | | time/ | | | episodes | 2912 | | fps | 125 | | time_elapsed | 5572 | | total_timesteps | 698880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.603 | | time/ | | | episodes | 2916 | | fps | 125 | | time_elapsed | 5594 | | total_timesteps | 700800 | | train/ | | | actor_loss | 0.316 | | critic_loss | 0.000438 | | ent_coef | 0.000966 | | ent_coef_loss | -1.37 | | learning_rate | 0.0003 | | n_updates | 87587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.603 | | time/ | | | episodes | 2920 | | fps | 125 | | time_elapsed | 5594 | | total_timesteps | 700800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.587 | | time/ | | | episodes | 2924 | | fps | 125 | | time_elapsed | 5618 | | total_timesteps | 702720 | | train/ | | | actor_loss | 0.267 | | critic_loss | 0.00015 | | ent_coef | 0.000965 | | ent_coef_loss | -0.589 | | learning_rate | 0.0003 | | n_updates | 87827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.587 | | time/ | | | episodes | 2928 | | fps | 125 | | time_elapsed | 5618 | | total_timesteps | 702720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.571 | | time/ | | | episodes | 2932 | | fps | 124 | | time_elapsed | 5638 | | total_timesteps | 704640 | | train/ | | | actor_loss | 0.269 | | critic_loss | 0.000234 | | ent_coef | 0.000953 | | ent_coef_loss | -0.357 | | learning_rate | 0.0003 | | n_updates | 88067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.571 | | time/ | | | episodes | 2936 | | fps | 124 | | time_elapsed | 5638 | | total_timesteps | 704640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.564 | | time/ | | | episodes | 2940 | | fps | 124 | | time_elapsed | 5662 | | total_timesteps | 706560 | | train/ | | | actor_loss | 0.263 | | critic_loss | 0.000139 | | ent_coef | 0.000985 | | ent_coef_loss | -0.236 | | learning_rate | 0.0003 | | n_updates | 88307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.564 | | time/ | | | episodes | 2944 | | fps | 124 | | time_elapsed | 5662 | | total_timesteps | 706560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.563 | | time/ | | | episodes | 2948 | | fps | 124 | | time_elapsed | 5684 | | total_timesteps | 708480 | | train/ | | | actor_loss | 0.275 | | critic_loss | 0.00016 | | ent_coef | 0.000989 | | ent_coef_loss | -0.611 | | learning_rate | 0.0003 | | n_updates | 88547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.563 | | time/ | | | episodes | 2952 | | fps | 124 | | time_elapsed | 5684 | | total_timesteps | 708480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.582 | | time/ | | | episodes | 2956 | | fps | 124 | | time_elapsed | 5704 | | total_timesteps | 710400 | | train/ | | | actor_loss | 0.275 | | critic_loss | 0.00017 | | ent_coef | 0.000968 | | ent_coef_loss | -0.47 | | learning_rate | 0.0003 | | n_updates | 88787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.582 | | time/ | | | episodes | 2960 | | fps | 124 | | time_elapsed | 5704 | | total_timesteps | 710400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 2964 | | fps | 124 | | time_elapsed | 5725 | | total_timesteps | 712320 | | train/ | | | actor_loss | 0.279 | | critic_loss | 0.000187 | | ent_coef | 0.000994 | | ent_coef_loss | -1.38 | | learning_rate | 0.0003 | | n_updates | 89027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.584 | | time/ | | | episodes | 2968 | | fps | 124 | | time_elapsed | 5725 | | total_timesteps | 712320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.568 | | time/ | | | episodes | 2972 | | fps | 124 | | time_elapsed | 5746 | | total_timesteps | 714240 | | train/ | | | actor_loss | 0.299 | | critic_loss | 0.005 | | ent_coef | 0.00102 | | ent_coef_loss | 1.38 | | learning_rate | 0.0003 | | n_updates | 89267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.568 | | time/ | | | episodes | 2976 | | fps | 124 | | time_elapsed | 5746 | | total_timesteps | 714240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.568 | | time/ | | | episodes | 2980 | | fps | 124 | | time_elapsed | 5771 | | total_timesteps | 716160 | | train/ | | | actor_loss | 0.283 | | critic_loss | 0.000417 | | ent_coef | 0.00106 | | ent_coef_loss | 0.303 | | learning_rate | 0.0003 | | n_updates | 89507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.568 | | time/ | | | episodes | 2984 | | fps | 124 | | time_elapsed | 5771 | | total_timesteps | 716160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.57 | | time/ | | | episodes | 2988 | | fps | 123 | | time_elapsed | 5792 | | total_timesteps | 718080 | | train/ | | | actor_loss | 0.371 | | critic_loss | 0.0132 | | ent_coef | 0.00106 | | ent_coef_loss | 0.593 | | learning_rate | 0.0003 | | n_updates | 89747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.57 | | time/ | | | episodes | 2992 | | fps | 123 | | time_elapsed | 5792 | | total_timesteps | 718080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.567 | | time/ | | | episodes | 2996 | | fps | 124 | | time_elapsed | 5804 | | total_timesteps | 720000 | | train/ | | | actor_loss | 0.303 | | critic_loss | 0.000545 | | ent_coef | 0.00105 | | ent_coef_loss | 1.27 | | learning_rate | 0.0003 | | n_updates | 89987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.567 | | time/ | | | episodes | 3000 | | fps | 124 | | time_elapsed | 5804 | | total_timesteps | 720000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.569 | | time/ | | | episodes | 3004 | | fps | 123 | | time_elapsed | 5823 | | total_timesteps | 721920 | | train/ | | | actor_loss | 0.35 | | critic_loss | 0.0103 | | ent_coef | 0.00106 | | ent_coef_loss | 0.755 | | learning_rate | 0.0003 | | n_updates | 90227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.569 | | time/ | | | episodes | 3008 | | fps | 123 | | time_elapsed | 5823 | | total_timesteps | 721920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.581 | | time/ | | | episodes | 3012 | | fps | 123 | | time_elapsed | 5845 | | total_timesteps | 723840 | | train/ | | | actor_loss | 0.297 | | critic_loss | 0.000179 | | ent_coef | 0.00105 | | ent_coef_loss | 0.26 | | learning_rate | 0.0003 | | n_updates | 90467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.581 | | time/ | | | episodes | 3016 | | fps | 123 | | time_elapsed | 5845 | | total_timesteps | 723840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.577 | | time/ | | | episodes | 3020 | | fps | 123 | | time_elapsed | 5866 | | total_timesteps | 725760 | | train/ | | | actor_loss | 0.29 | | critic_loss | 0.000504 | | ent_coef | 0.00104 | | ent_coef_loss | -1.06 | | learning_rate | 0.0003 | | n_updates | 90707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.577 | | time/ | | | episodes | 3024 | | fps | 123 | | time_elapsed | 5866 | | total_timesteps | 725760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.572 | | time/ | | | episodes | 3028 | | fps | 123 | | time_elapsed | 5886 | | total_timesteps | 727680 | | train/ | | | actor_loss | 0.323 | | critic_loss | 0.000281 | | ent_coef | 0.001 | | ent_coef_loss | -0.694 | | learning_rate | 0.0003 | | n_updates | 90947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.572 | | time/ | | | episodes | 3032 | | fps | 123 | | time_elapsed | 5886 | | total_timesteps | 727680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.574 | | time/ | | | episodes | 3036 | | fps | 123 | | time_elapsed | 5906 | | total_timesteps | 729600 | | train/ | | | actor_loss | 0.395 | | critic_loss | 0.0135 | | ent_coef | 0.00095 | | ent_coef_loss | -0.466 | | learning_rate | 0.0003 | | n_updates | 91187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.574 | | time/ | | | episodes | 3040 | | fps | 123 | | time_elapsed | 5906 | | total_timesteps | 729600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.57 | | time/ | | | episodes | 3044 | | fps | 123 | | time_elapsed | 5929 | | total_timesteps | 731520 | | train/ | | | actor_loss | 0.291 | | critic_loss | 0.000193 | | ent_coef | 0.000929 | | ent_coef_loss | -1.38 | | learning_rate | 0.0003 | | n_updates | 91427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.57 | | time/ | | | episodes | 3048 | | fps | 123 | | time_elapsed | 5929 | | total_timesteps | 731520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.573 | | time/ | | | episodes | 3052 | | fps | 123 | | time_elapsed | 5951 | | total_timesteps | 733440 | | train/ | | | actor_loss | 0.363 | | critic_loss | 0.0103 | | ent_coef | 0.000917 | | ent_coef_loss | 2.82 | | learning_rate | 0.0003 | | n_updates | 91667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.573 | | time/ | | | episodes | 3056 | | fps | 123 | | time_elapsed | 5951 | | total_timesteps | 733440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.558 | | time/ | | | episodes | 3060 | | fps | 123 | | time_elapsed | 5977 | | total_timesteps | 735360 | | train/ | | | actor_loss | 0.299 | | critic_loss | 0.000224 | | ent_coef | 0.000891 | | ent_coef_loss | -1.06 | | learning_rate | 0.0003 | | n_updates | 91907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.558 | | time/ | | | episodes | 3064 | | fps | 123 | | time_elapsed | 5977 | | total_timesteps | 735360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.57 | | time/ | | | episodes | 3068 | | fps | 122 | | time_elapsed | 6001 | | total_timesteps | 737280 | | train/ | | | actor_loss | 0.337 | | critic_loss | 0.00324 | | ent_coef | 0.000857 | | ent_coef_loss | 0.765 | | learning_rate | 0.0003 | | n_updates | 92147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.57 | | time/ | | | episodes | 3072 | | fps | 122 | | time_elapsed | 6001 | | total_timesteps | 737280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.588 | | time/ | | | episodes | 3076 | | fps | 122 | | time_elapsed | 6032 | | total_timesteps | 739200 | | train/ | | | actor_loss | 0.303 | | critic_loss | 0.000161 | | ent_coef | 0.000836 | | ent_coef_loss | -0.286 | | learning_rate | 0.0003 | | n_updates | 92387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.588 | | time/ | | | episodes | 3080 | | fps | 122 | | time_elapsed | 6032 | | total_timesteps | 739200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.582 | | time/ | | | episodes | 3084 | | fps | 122 | | time_elapsed | 6052 | | total_timesteps | 741120 | | train/ | | | actor_loss | 0.402 | | critic_loss | 0.00534 | | ent_coef | 0.000826 | | ent_coef_loss | 0.148 | | learning_rate | 0.0003 | | n_updates | 92627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.582 | | time/ | | | episodes | 3088 | | fps | 122 | | time_elapsed | 6052 | | total_timesteps | 741120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.571 | | time/ | | | episodes | 3092 | | fps | 122 | | time_elapsed | 6074 | | total_timesteps | 743040 | | train/ | | | actor_loss | 0.401 | | critic_loss | 0.00123 | | ent_coef | 0.000841 | | ent_coef_loss | 0.925 | | learning_rate | 0.0003 | | n_updates | 92867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.571 | | time/ | | | episodes | 3096 | | fps | 122 | | time_elapsed | 6074 | | total_timesteps | 743040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.587 | | time/ | | | episodes | 3100 | | fps | 122 | | time_elapsed | 6096 | | total_timesteps | 744960 | | train/ | | | actor_loss | 0.319 | | critic_loss | 0.00167 | | ent_coef | 0.00086 | | ent_coef_loss | 1.3 | | learning_rate | 0.0003 | | n_updates | 93107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.587 | | time/ | | | episodes | 3104 | | fps | 122 | | time_elapsed | 6096 | | total_timesteps | 744960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.59 | | time/ | | | episodes | 3108 | | fps | 122 | | time_elapsed | 6118 | | total_timesteps | 746880 | | train/ | | | actor_loss | 0.305 | | critic_loss | 0.000237 | | ent_coef | 0.000869 | | ent_coef_loss | -0.526 | | learning_rate | 0.0003 | | n_updates | 93347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.59 | | time/ | | | episodes | 3112 | | fps | 122 | | time_elapsed | 6118 | | total_timesteps | 746880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.588 | | time/ | | | episodes | 3116 | | fps | 121 | | time_elapsed | 6138 | | total_timesteps | 748800 | | train/ | | | actor_loss | 0.301 | | critic_loss | 0.000278 | | ent_coef | 0.000899 | | ent_coef_loss | -0.155 | | learning_rate | 0.0003 | | n_updates | 93587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.588 | | time/ | | | episodes | 3120 | | fps | 121 | | time_elapsed | 6138 | | total_timesteps | 748800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 3124 | | fps | 121 | | time_elapsed | 6156 | | total_timesteps | 750720 | | train/ | | | actor_loss | 0.306 | | critic_loss | 0.000177 | | ent_coef | 0.000895 | | ent_coef_loss | -0.483 | | learning_rate | 0.0003 | | n_updates | 93827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.612 | | time/ | | | episodes | 3128 | | fps | 121 | | time_elapsed | 6156 | | total_timesteps | 750720 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.629 | | time/ | | | episodes | 3132 | | fps | 121 | | time_elapsed | 6176 | | total_timesteps | 752640 | | train/ | | | actor_loss | 0.428 | | critic_loss | 0.00898 | | ent_coef | 0.000892 | | ent_coef_loss | -1.3 | | learning_rate | 0.0003 | | n_updates | 94067 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.629 | | time/ | | | episodes | 3136 | | fps | 121 | | time_elapsed | 6176 | | total_timesteps | 752640 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 3140 | | fps | 121 | | time_elapsed | 6196 | | total_timesteps | 754560 | | train/ | | | actor_loss | 0.304 | | critic_loss | 0.000168 | | ent_coef | 0.000902 | | ent_coef_loss | 0.525 | | learning_rate | 0.0003 | | n_updates | 94307 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 3144 | | fps | 121 | | time_elapsed | 6196 | | total_timesteps | 754560 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.629 | | time/ | | | episodes | 3148 | | fps | 121 | | time_elapsed | 6214 | | total_timesteps | 756480 | | train/ | | | actor_loss | 0.313 | | critic_loss | 0.00033 | | ent_coef | 0.000985 | | ent_coef_loss | 0.702 | | learning_rate | 0.0003 | | n_updates | 94547 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.629 | | time/ | | | episodes | 3152 | | fps | 121 | | time_elapsed | 6214 | | total_timesteps | 756480 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 3156 | | fps | 121 | | time_elapsed | 6235 | | total_timesteps | 758400 | | train/ | | | actor_loss | 0.356 | | critic_loss | 0.00497 | | ent_coef | 0.000988 | | ent_coef_loss | -0.406 | | learning_rate | 0.0003 | | n_updates | 94787 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 3160 | | fps | 121 | | time_elapsed | 6235 | | total_timesteps | 758400 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.645 | | time/ | | | episodes | 3164 | | fps | 121 | | time_elapsed | 6254 | | total_timesteps | 760320 | | train/ | | | actor_loss | 0.321 | | critic_loss | 0.000361 | | ent_coef | 0.000992 | | ent_coef_loss | -0.632 | | learning_rate | 0.0003 | | n_updates | 95027 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.645 | | time/ | | | episodes | 3168 | | fps | 121 | | time_elapsed | 6254 | | total_timesteps | 760320 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.621 | | time/ | | | episodes | 3172 | | fps | 121 | | time_elapsed | 6277 | | total_timesteps | 762240 | | train/ | | | actor_loss | 0.345 | | critic_loss | 0.00164 | | ent_coef | 0.000965 | | ent_coef_loss | 0.189 | | learning_rate | 0.0003 | | n_updates | 95267 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.621 | | time/ | | | episodes | 3176 | | fps | 121 | | time_elapsed | 6277 | | total_timesteps | 762240 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.627 | | time/ | | | episodes | 3180 | | fps | 121 | | time_elapsed | 6303 | | total_timesteps | 764160 | | train/ | | | actor_loss | 0.442 | | critic_loss | 0.0016 | | ent_coef | 0.000944 | | ent_coef_loss | 0.166 | | learning_rate | 0.0003 | | n_updates | 95507 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.627 | | time/ | | | episodes | 3184 | | fps | 121 | | time_elapsed | 6303 | | total_timesteps | 764160 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.65 | | time/ | | | episodes | 3188 | | fps | 121 | | time_elapsed | 6330 | | total_timesteps | 766080 | | train/ | | | actor_loss | 0.404 | | critic_loss | 0.00308 | | ent_coef | 0.000927 | | ent_coef_loss | 0.299 | | learning_rate | 0.0003 | | n_updates | 95747 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.65 | | time/ | | | episodes | 3192 | | fps | 121 | | time_elapsed | 6330 | | total_timesteps | 766080 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.653 | | time/ | | | episodes | 3196 | | fps | 120 | | time_elapsed | 6357 | | total_timesteps | 768000 | | train/ | | | actor_loss | 0.309 | | critic_loss | 0.000533 | | ent_coef | 0.000972 | | ent_coef_loss | 0.111 | | learning_rate | 0.0003 | | n_updates | 95987 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.653 | | time/ | | | episodes | 3200 | | fps | 120 | | time_elapsed | 6357 | | total_timesteps | 768000 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.633 | | time/ | | | episodes | 3204 | | fps | 120 | | time_elapsed | 6378 | | total_timesteps | 769920 | | train/ | | | actor_loss | 0.306 | | critic_loss | 0.0002 | | ent_coef | 0.000974 | | ent_coef_loss | -0.54 | | learning_rate | 0.0003 | | n_updates | 96227 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.633 | | time/ | | | episodes | 3208 | | fps | 120 | | time_elapsed | 6378 | | total_timesteps | 769920 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.641 | | time/ | | | episodes | 3212 | | fps | 120 | | time_elapsed | 6399 | | total_timesteps | 771840 | | train/ | | | actor_loss | 0.387 | | critic_loss | 0.00272 | | ent_coef | 0.000985 | | ent_coef_loss | 0.333 | | learning_rate | 0.0003 | | n_updates | 96467 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.641 | | time/ | | | episodes | 3216 | | fps | 120 | | time_elapsed | 6399 | | total_timesteps | 771840 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.639 | | time/ | | | episodes | 3220 | | fps | 120 | | time_elapsed | 6421 | | total_timesteps | 773760 | | train/ | | | actor_loss | 0.308 | | critic_loss | 0.000194 | | ent_coef | 0.000975 | | ent_coef_loss | -0.56 | | learning_rate | 0.0003 | | n_updates | 96707 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.639 | | time/ | | | episodes | 3224 | | fps | 120 | | time_elapsed | 6421 | | total_timesteps | 773760 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.611 | | time/ | | | episodes | 3228 | | fps | 120 | | time_elapsed | 6442 | | total_timesteps | 775680 | | train/ | | | actor_loss | 0.342 | | critic_loss | 0.00722 | | ent_coef | 0.000981 | | ent_coef_loss | 0.676 | | learning_rate | 0.0003 | | n_updates | 96947 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.611 | | time/ | | | episodes | 3232 | | fps | 120 | | time_elapsed | 6442 | | total_timesteps | 775680 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.624 | | time/ | | | episodes | 3236 | | fps | 120 | | time_elapsed | 6462 | | total_timesteps | 777600 | | train/ | | | actor_loss | 0.359 | | critic_loss | 0.00237 | | ent_coef | 0.000959 | | ent_coef_loss | -0.164 | | learning_rate | 0.0003 | | n_updates | 97187 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.624 | | time/ | | | episodes | 3240 | | fps | 120 | | time_elapsed | 6462 | | total_timesteps | 777600 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.621 | | time/ | | | episodes | 3244 | | fps | 120 | | time_elapsed | 6482 | | total_timesteps | 779520 | | train/ | | | actor_loss | 0.323 | | critic_loss | 0.000259 | | ent_coef | 0.000964 | | ent_coef_loss | 0.715 | | learning_rate | 0.0003 | | n_updates | 97427 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.621 | | time/ | | | episodes | 3248 | | fps | 120 | | time_elapsed | 6482 | | total_timesteps | 779520 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 3252 | | fps | 120 | | time_elapsed | 6504 | | total_timesteps | 781440 | | train/ | | | actor_loss | 0.586 | | critic_loss | 0.000762 | | ent_coef | 0.00103 | | ent_coef_loss | -0.4 | | learning_rate | 0.0003 | | n_updates | 97667 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 3256 | | fps | 120 | | time_elapsed | 6504 | | total_timesteps | 781440 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 3260 | | fps | 120 | | time_elapsed | 6526 | | total_timesteps | 783360 | | train/ | | | actor_loss | 0.417 | | critic_loss | 0.0107 | | ent_coef | 0.00103 | | ent_coef_loss | -0.268 | | learning_rate | 0.0003 | | n_updates | 97907 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.618 | | time/ | | | episodes | 3264 | | fps | 120 | | time_elapsed | 6526 | | total_timesteps | 783360 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.625 | | time/ | | | episodes | 3268 | | fps | 119 | | time_elapsed | 6549 | | total_timesteps | 785280 | | train/ | | | actor_loss | 0.32 | | critic_loss | 0.000161 | | ent_coef | 0.000998 | | ent_coef_loss | -0.0602 | | learning_rate | 0.0003 | | n_updates | 98147 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.625 | | time/ | | | episodes | 3272 | | fps | 119 | | time_elapsed | 6549 | | total_timesteps | 785280 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.63 | | time/ | | | episodes | 3276 | | fps | 119 | | time_elapsed | 6574 | | total_timesteps | 787200 | | train/ | | | actor_loss | 0.322 | | critic_loss | 0.000147 | | ent_coef | 0.000976 | | ent_coef_loss | -0.708 | | learning_rate | 0.0003 | | n_updates | 98387 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.63 | | time/ | | | episodes | 3280 | | fps | 119 | | time_elapsed | 6574 | | total_timesteps | 787200 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 3284 | | fps | 119 | | time_elapsed | 6595 | | total_timesteps | 789120 | | train/ | | | actor_loss | 0.324 | | critic_loss | 0.000196 | | ent_coef | 0.000939 | | ent_coef_loss | -0.311 | | learning_rate | 0.0003 | | n_updates | 98627 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.637 | | time/ | | | episodes | 3288 | | fps | 119 | | time_elapsed | 6595 | | total_timesteps | 789120 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 3292 | | fps | 119 | | time_elapsed | 6616 | | total_timesteps | 791040 | | train/ | | | actor_loss | 0.382 | | critic_loss | 0.000573 | | ent_coef | 0.000938 | | ent_coef_loss | 0.346 | | learning_rate | 0.0003 | | n_updates | 98867 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.649 | | time/ | | | episodes | 3296 | | fps | 119 | | time_elapsed | 6616 | | total_timesteps | 791040 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.658 | | time/ | | | episodes | 3300 | | fps | 119 | | time_elapsed | 6637 | | total_timesteps | 792960 | | train/ | | | actor_loss | 0.325 | | critic_loss | 0.000132 | | ent_coef | 0.000917 | | ent_coef_loss | 0.26 | | learning_rate | 0.0003 | | n_updates | 99107 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.658 | | time/ | | | episodes | 3304 | | fps | 119 | | time_elapsed | 6637 | | total_timesteps | 792960 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 3308 | | fps | 119 | | time_elapsed | 6657 | | total_timesteps | 794880 | | train/ | | | actor_loss | 0.459 | | critic_loss | 0.0206 | | ent_coef | 0.000891 | | ent_coef_loss | 0.176 | | learning_rate | 0.0003 | | n_updates | 99347 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 3312 | | fps | 119 | | time_elapsed | 6657 | | total_timesteps | 794880 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 3316 | | fps | 119 | | time_elapsed | 6668 | | total_timesteps | 796800 | | train/ | | | actor_loss | 0.32 | | critic_loss | 0.000736 | | ent_coef | 0.000864 | | ent_coef_loss | -0.845 | | learning_rate | 0.0003 | | n_updates | 99587 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.654 | | time/ | | | episodes | 3320 | | fps | 119 | | time_elapsed | 6668 | | total_timesteps | 796800 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.662 | | time/ | | | episodes | 3324 | | fps | 119 | | time_elapsed | 6683 | | total_timesteps | 798720 | | train/ | | | actor_loss | 0.326 | | critic_loss | 0.000148 | | ent_coef | 0.000818 | | ent_coef_loss | 0.0262 | | learning_rate | 0.0003 | | n_updates | 99827 | --------------------------------- --------------------------------- | rollout/ | | | ep_len_mean | 240 | | ep_rew_mean | -0.662 | | time/ | | | episodes | 3328 | | fps | 119 | | time_elapsed | 6683 | | total_timesteps | 798720 | ---------------------------------
res_unhedged = rollout_policy_on_exogenous_X(
cm, X_parallel, policy="unhedged", action_max=u_scale, X_ref=X_ref
)
res_lq = rollout_policy_on_exogenous_X(
cm, X_parallel, policy="lq",
K_lq=K_lq, action_max=u_scale, X_ref=X_ref
)
res_rl = rollout_policy_on_exogenous_X(
cm, X_parallel, policy="rl",
rl_model=model_l1, action_max=u_scale, X_ref=X_ref
)
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) u_t = -float(K_lq @ x_t)
eval_unhedged = eval_L1_cost(
res_unhedged["NII"], res_unhedged["h"], res_unhedged["u"],
lambda_h=lambda_h_star, kappa_u=kappa_u_rl
)
eval_lq = eval_L1_cost(
res_lq["NII"], res_lq["h"], res_lq["u"],
lambda_h=lambda_h_star, kappa_u=kappa_u_rl
)
eval_rl = eval_L1_cost(
res_rl["NII"], res_rl["h"], res_rl["u"],
lambda_h=lambda_h_star, kappa_u=kappa_u_rl
)
print("L1 cost (lower is better):")
print("Unhedged:", eval_unhedged)
print("LQ:", eval_lq)
print("RL:", eval_rl)
L1 cost (lower is better): Unhedged: 0.0014518402932128523 LQ: 0.0014462204424909896 RL: 0.001430356297024223
plt.figure(figsize=(9,4))
plt.plot(res_unhedged["NII"], label="Unhedged")
plt.plot(res_lq["NII"], label="LQ (L2)")
plt.plot(res_rl["NII"], label="RL (L1)")
plt.title("Stress: +200bp parallel")
plt.xlabel("t (months)")
plt.ylabel("NII")
plt.legend()
plt.tight_layout()
plt.show()
plt.figure(figsize=(9,3))
plt.plot(res_lq["h"], label="LQ hedge")
plt.plot(res_rl["h"], label="RL hedge")
plt.title("Hedge inventory")
plt.legend()
plt.tight_layout()
plt.show()
# ============================================================
# 0) YOU PROVIDE THESE
# ============================================================
# You need a function that returns a rollout dict:
# {"NII": (T-1,), "h": (T,), "u": (T-1,)}
#
def get_rollout(cm, X_path, policy_name,
K_lq=None, rl_model=None,
X_ref=None, action_max=50.0):
"""
Adapter around existing rollout_policy_on_exogenous_X.
policy_name in {"unhedged","lq","rl"}.
Returns dict with keys: NII, h, u.
"""
if policy_name == "unhedged":
res = rollout_policy_on_exogenous_X(cm, X_path, "unhedged",
action_max=action_max, X_ref=X_ref)
elif policy_name == "lq":
res = rollout_policy_on_exogenous_X(cm, X_path, "lq",
K_lq=K_lq, action_max=action_max, X_ref=X_ref)
elif policy_name == "rl":
res = rollout_policy_on_exogenous_X(cm, X_path, "rl",
rl_model=rl_model, action_max=action_max, X_ref=X_ref)
else:
raise ValueError("policy_name must be 'unhedged', 'lq', or 'rl'")
return {"NII": np.asarray(res["NII"]), "h": np.asarray(res["h"]), "u": np.asarray(res["u"])}
# ============================================================
# 1) METRICS + TABLES
# ============================================================
def l1_economic_cost(nii, h, u, lambda_h, kappa_u):
"""
Evaluation functional for the L1 setting:
mean( NII^2 + lambda_h*h^2 + kappa_u*|u| )
(h aligned to t where NII,u exist => h[:-1])
"""
nii = np.asarray(nii)
h = np.asarray(h)
u = np.asarray(u)
return float(np.mean(nii**2 + lambda_h*(h[:-1]**2) + kappa_u*np.abs(u)))
def summarize_rollout(roll, lambda_h=None, kappa_u=None):
nii, h, u = roll["NII"], roll["h"], roll["u"]
out = {
"std_NII": float(np.std(nii)),
"p05_NII": float(np.quantile(nii, 0.05)),
"min_NII": float(np.min(nii)),
"mean_NII": float(np.mean(nii)),
"mean_abs_u": float(np.mean(np.abs(u))),
"max_abs_u": float(np.max(np.abs(u))),
"mean_abs_h": float(np.mean(np.abs(h))),
"max_abs_h": float(np.max(np.abs(h))),
}
if (lambda_h is not None) and (kappa_u is not None):
out["L1_cost"] = l1_economic_cost(nii, h, u, lambda_h, kappa_u)
return out
def build_metrics_tables(cm, scenarios, K_lq, rl_model,
X_ref, action_max,
lambda_h_eval, kappa_u_eval):
"""
scenarios: dict name -> X_path
Returns:
df_metrics: multiindex (scenario, policy)
df_norm: normalized to unhedged per scenario
"""
rows = []
for scen_name, X_path in scenarios.items():
for pol, label in [("unhedged","Unhedged"), ("lq","LQ (L2)"), ("rl","RL (L1)")]:
roll = get_rollout(cm, X_path, pol, K_lq=K_lq, rl_model=rl_model,
X_ref=X_ref, action_max=action_max)
met = summarize_rollout(roll, lambda_h=lambda_h_eval, kappa_u=kappa_u_eval)
rows.append({"scenario": scen_name, "policy": label, **met})
df = pd.DataFrame(rows).set_index(["scenario","policy"]).sort_index()
# Normalized table: divide each metric by unhedged metric (per scenario)
metrics_cols = [c for c in df.columns if c not in []]
df_norm = df.copy()
for scen in df.index.get_level_values(0).unique():
base = df.loc[(scen, "Unhedged"), metrics_cols]
df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0
return df, df_norm
# ============================================================
# 2) PLOTS
# ============================================================
def plot_nii_overlay(rolls, title):
plt.figure(figsize=(9,4))
for label, roll in rolls.items():
plt.plot(roll["NII"], label=label)
plt.title(title)
plt.xlabel("t (months)")
plt.ylabel("NII")
plt.legend()
plt.tight_layout()
plt.show()
def plot_h_overlay(rolls, title):
plt.figure(figsize=(9,3))
for label, roll in rolls.items():
plt.plot(roll["h"], label=label)
plt.title(title)
plt.xlabel("t (months)")
plt.ylabel("Hedge inventory h_t")
plt.legend()
plt.tight_layout()
plt.show()
def plot_absu_histogram(roll_lq, roll_rl, title):
plt.figure(figsize=(8,4))
plt.hist(np.abs(roll_lq["u"]), bins=40, alpha=0.6, label="LQ (L2)")
plt.hist(np.abs(roll_rl["u"]), bins=40, alpha=0.6, label="RL (L1)")
plt.title(title)
plt.xlabel("|u_t| (absolute hedge change)")
plt.ylabel("Frequency")
plt.legend()
plt.tight_layout()
plt.show()
def plot_risk_turnover_scatter(df_metrics, title="Risk–turnover scatter"):
"""
Scatter per scenario/policy:
x = mean_abs_u, y = std_NII
"""
plt.figure(figsize=(8,5))
for (scen, pol), row in df_metrics.iterrows():
x = row["mean_abs_u"]
y = row["std_NII"]
plt.scatter(x, y)
plt.annotate(f"{scen}\n{pol}", (x, y), fontsize=8, xytext=(4,4), textcoords="offset points")
plt.xlabel("Turnover proxy: mean(|u_t|)")
plt.ylabel("Risk proxy: std(NII)")
plt.title(title)
plt.tight_layout()
plt.show()
# ============================================================
# 3) RUN IT: define scenario dictionary + produce outputs
# ============================================================
# --- Provide scenario paths ---
scenarios = {
"Baseline (smoothed)": X_smooth,
"Stress: +200bp parallel": X_parallel,
"Stress: high vol x3": X_highvol,
}
# --- Evaluation settings ---
ACTION_MAX = 50.0
LAMBDA_H_EVAL = lambda_h_star # chosen hedge-inventory penalty
KAPPA_U_EVAL = 3e-4 # same kappa_u from L1 RL training/eval
# --- Models from before ---
# K_lq: from Riccati solution for LQ benchmark
# model_l1: trained SAC model for RL (L1)
df_metrics, df_norm = build_metrics_tables(
cm=cm,
scenarios=scenarios,
K_lq=K_lq,
rl_model=model_l1,
X_ref=X_ref,
action_max=ACTION_MAX,
lambda_h_eval=LAMBDA_H_EVAL,
kappa_u_eval=KAPPA_U_EVAL
)
print("=== TABLE 1: Core metrics ===")
display(df_metrics)
print("=== TABLE 2: Normalized to Unhedged (Unhedged=100) ===")
display(df_norm)
# ============================================================
# 4) SELECTED PLOTS (6 total)
# ============================================================
# Helper to fetch rolls for a scenario
def get_rolls_for_scenario(X_path, scen_label):
roll_u = get_rollout(cm, X_path, "unhedged", X_ref=X_ref, action_max=ACTION_MAX)
roll_l = get_rollout(cm, X_path, "lq", K_lq=K_lq, X_ref=X_ref, action_max=ACTION_MAX)
roll_r = get_rollout(cm, X_path, "rl", rl_model=model_l1, X_ref=X_ref, action_max=ACTION_MAX)
return {
"Unhedged": roll_u,
"LQ (L2)": roll_l,
"RL (L1)": roll_r
}
# (1) Baseline NII overlay
rolls_base = get_rolls_for_scenario(X_smooth, "Baseline (smoothed)")
plot_nii_overlay(rolls_base, "Baseline: NII paths (Unhedged vs LQ vs RL-L1)")
# (2) Stress +200bp NII overlay
rolls_par = get_rolls_for_scenario(X_parallel, "Stress: +200bp parallel")
plot_nii_overlay(rolls_par, "Stress (+200bp parallel): NII paths")
# (3) Stress high-vol NII overlay
rolls_hv = get_rolls_for_scenario(X_highvol, "Stress: high vol x3")
plot_nii_overlay(rolls_hv, "Stress (high vol x3): NII paths")
# (4) Baseline hedge inventory overlay (LQ vs RL)
plot_h_overlay({"LQ (L2)": rolls_base["LQ (L2)"], "RL (L1)": rolls_base["RL (L1)"]},
"Baseline: Hedge inventory (LQ vs RL-L1)")
# (5) Histogram of |u| (baseline) to show sparse trading
plot_absu_histogram(rolls_base["LQ (L2)"], rolls_base["RL (L1)"],
"Baseline: Trading sparsity (|u_t| histogram)")
# (6) Risk–turnover scatter using df_metrics (all scenarios/policies)
plot_risk_turnover_scatter(df_metrics, title="Risk–turnover map across scenarios (policy dots)")
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) u_t = -float(K_lq @ x_t)
=== TABLE 1: Core metrics ===
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: divide by zero encountered in divide df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0 C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: invalid value encountered in divide df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0 C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: divide by zero encountered in divide df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0 C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: invalid value encountered in divide df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0 C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: divide by zero encountered in divide df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0 C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: invalid value encountered in divide df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0
| std_NII | p05_NII | min_NII | mean_NII | mean_abs_u | max_abs_u | mean_abs_h | max_abs_h | L1_cost | ||
|---|---|---|---|---|---|---|---|---|---|---|
| scenario | policy | |||||||||
| Baseline (smoothed) | LQ (L2) | 0.037672 | -0.028228 | -0.064571 | 0.028723 | 0.130948 | 0.445572 | 14.301091 | 19.293542 | 0.002502 |
| RL (L1) | 0.037807 | -0.028121 | -0.063079 | 0.029002 | 0.095940 | 0.362447 | 14.424503 | 20.110600 | 0.002524 | |
| Unhedged | 0.041502 | -0.028967 | -0.069115 | 0.031514 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.002716 | |
| Stress: +200bp parallel | LQ (L2) | 0.010117 | 0.020354 | 0.020128 | 0.033284 | 0.050177 | 0.539936 | 14.585011 | 17.558518 | 0.001446 |
| RL (L1) | 0.010486 | 0.020406 | 0.020177 | 0.033765 | 0.040012 | 0.281692 | 12.640191 | 15.121961 | 0.001430 | |
| Unhedged | 0.011522 | 0.021739 | 0.021490 | 0.036319 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.001452 | |
| Stress: high vol x3 | LQ (L2) | 0.028799 | -0.010316 | -0.054684 | 0.036726 | 0.090683 | 0.484180 | 13.971398 | 17.529290 | 0.002409 |
| RL (L1) | 0.029025 | -0.010358 | -0.055629 | 0.037088 | 0.064955 | 0.307148 | 12.191542 | 17.197047 | 0.002401 | |
| Unhedged | 0.031582 | -0.010008 | -0.058414 | 0.040042 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.002601 |
=== TABLE 2: Normalized to Unhedged (Unhedged=100) ===
| std_NII | p05_NII | min_NII | mean_NII | mean_abs_u | max_abs_u | mean_abs_h | max_abs_h | L1_cost | ||
|---|---|---|---|---|---|---|---|---|---|---|
| scenario | policy | |||||||||
| Baseline (smoothed) | LQ (L2) | 90.771525 | 97.450109 | 93.425399 | 91.145057 | inf | inf | inf | inf | 92.151849 |
| RL (L1) | 91.096380 | 97.078798 | 91.266641 | 92.029153 | inf | inf | inf | inf | 92.950930 | |
| Unhedged | 100.000000 | 100.000000 | 100.000000 | 100.000000 | NaN | NaN | NaN | NaN | 100.000000 | |
| Stress: +200bp parallel | LQ (L2) | 87.800881 | 93.628756 | 93.665736 | 91.642397 | inf | inf | inf | inf | 99.612915 |
| RL (L1) | 91.001717 | 93.870465 | 93.893229 | 92.966375 | inf | inf | inf | inf | 98.520223 | |
| Unhedged | 100.000000 | 100.000000 | 100.000000 | 100.000000 | NaN | NaN | NaN | NaN | 100.000000 | |
| Stress: high vol x3 | LQ (L2) | 91.188148 | 103.082277 | 93.614348 | 91.717293 | inf | inf | inf | inf | 92.611408 |
| RL (L1) | 91.902659 | 103.493855 | 95.231757 | 92.622201 | inf | inf | inf | inf | 92.335225 | |
| Unhedged | 100.000000 | 100.000000 | 100.000000 | 100.000000 | NaN | NaN | NaN | NaN | 100.000000 |
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) u_t = -float(K_lq @ x_t)