Reinforcement Learning and Optimal Control for IRRBB Hedging Under Uncertainty¶

This notebook builds an example of IRRBB (NII) hedging using term structure models, statistical time series, control theory, and reinforcement learning:

  1. Yield curve data (GSW): download + clean a panel of zero-coupon yields.
  2. Term structure model:
    • Start with Diebold–Li (DNS) factors estimated by cross-sectional regression (OLS).
    • Put the model into state-space form and apply Kalman filtering/smoothing.
    • Extend to AFNS (Arbitrage-Free Nelson–Siegel) by adding the no-arbitrage yield adjustment.
  3. Banking-book NII model: build a simplified balance sheet and compute NII using representative asset/liability repricing rates plus a hedge instrument.
  4. Dynamic hedging:
    • Classical control (LQ) under quadratic objectives (risk vs trading/inventory penalties).
    • Reinforcement Learning (SAC), first under the same quadratic setting, then under L1 transaction costs where LQ is no longer optimal.
  5. Stress testing & comparison: evaluate Unhedged vs LQ vs RL across baseline and stress scenarios with tables and plots.

The goal is a controlled comparison: show where classical methods dominate (linear–quadratic world) and where RL becomes valuable (realistic frictions such as L1 costs).

1) Yield curve dataset (GSW) and preprocessing¶

We use the Gurkaynak–Sack–Wright (GSW) U.S. Treasury zero-coupon curve because it is a clean, widely used academic dataset with a long history and many maturities. We use monthly frequency for this exercise.

In the next cells we:

  • download (or load cached) GSW yields,
  • convert columns/maturities,
  • store both daily and monthly versions to keep the ETL step reproducible.
In [1]:
from pathlib import Path
import pandas as pd
import requests
from io import StringIO
import matplotlib.pyplot as plt
import numpy as np

DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)

RAW_CSV_PATH = DATA_DIR / "feds200628.csv"   # local cached file
DAILY_ZC_CSV_PATH = DATA_DIR / "gsw_zero_coupon_daily.csv"
MONTHLY_ZC_CSV_PATH = DATA_DIR / "gsw_zero_coupon_monthly.csv"
In [2]:
GSW_URL = "https://www.federalreserve.gov/data/yield-curve-tables/feds200628.csv"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/127.0.0.1 Safari/537.36"
}

if not RAW_CSV_PATH.exists():
    print("Local GSW file not found. Downloading from Fed...")
    resp = requests.get(GSW_URL, headers=headers)
    resp.raise_for_status()

    # Some versions require skipping first 9 rows, but we write the raw text first
    with open(RAW_CSV_PATH, "w", encoding="utf-8") as f:
        f.write(resp.text)

    print(f"Saved raw GSW CSV to {RAW_CSV_PATH.resolve()}")
else:
    print("Local GSW file already exists. Skipping download.")
Local GSW file already exists. Skipping download.
In [3]:
print("Loading local GSW CSV...")

df_raw = pd.read_csv(RAW_CSV_PATH, skiprows=9)

print("Raw shape:", df_raw.shape)
df_raw.head()
Loading local GSW CSV...
Raw shape: (16853, 100)
Out[3]:
Date BETA0 BETA1 BETA2 BETA3 SVEN1F01 SVEN1F04 SVEN1F09 SVENF01 SVENF02 ... SVENY23 SVENY24 SVENY25 SVENY26 SVENY27 SVENY28 SVENY29 SVENY30 TAU1 TAU2
0 1961-06-14 3.917606 -1.277955 -1.949397 0.0 3.8067 3.9562 NaN 3.5492 3.8825 ... NaN NaN NaN NaN NaN NaN NaN NaN 0.339218 -999.99
1 1961-06-15 3.978498 -1.257404 -2.247617 0.0 3.8694 4.0183 NaN 3.5997 3.9460 ... NaN NaN NaN NaN NaN NaN NaN NaN 0.325775 -999.99
2 1961-06-16 3.984350 -1.429538 -1.885024 0.0 3.8634 4.0242 NaN 3.5957 3.9448 ... NaN NaN NaN NaN NaN NaN NaN NaN 0.348817 -999.99
3 1961-06-19 4.004379 -0.723311 -3.310743 0.0 3.9196 4.0447 NaN 3.6447 3.9842 ... NaN NaN NaN NaN NaN NaN NaN NaN 0.282087 -999.99
4 1961-06-20 3.985789 -0.900432 -2.844809 0.0 3.8732 4.0257 NaN 3.5845 3.9552 ... NaN NaN NaN NaN NaN NaN NaN NaN 0.310316 -999.99

5 rows × 100 columns

In [4]:
df_raw.columns
Out[4]:
Index(['Date', 'BETA0', 'BETA1', 'BETA2', 'BETA3', 'SVEN1F01', 'SVEN1F04',
       'SVEN1F09', 'SVENF01', 'SVENF02', 'SVENF03', 'SVENF04', 'SVENF05',
       'SVENF06', 'SVENF07', 'SVENF08', 'SVENF09', 'SVENF10', 'SVENF11',
       'SVENF12', 'SVENF13', 'SVENF14', 'SVENF15', 'SVENF16', 'SVENF17',
       'SVENF18', 'SVENF19', 'SVENF20', 'SVENF21', 'SVENF22', 'SVENF23',
       'SVENF24', 'SVENF25', 'SVENF26', 'SVENF27', 'SVENF28', 'SVENF29',
       'SVENF30', 'SVENPY01', 'SVENPY02', 'SVENPY03', 'SVENPY04', 'SVENPY05',
       'SVENPY06', 'SVENPY07', 'SVENPY08', 'SVENPY09', 'SVENPY10', 'SVENPY11',
       'SVENPY12', 'SVENPY13', 'SVENPY14', 'SVENPY15', 'SVENPY16', 'SVENPY17',
       'SVENPY18', 'SVENPY19', 'SVENPY20', 'SVENPY21', 'SVENPY22', 'SVENPY23',
       'SVENPY24', 'SVENPY25', 'SVENPY26', 'SVENPY27', 'SVENPY28', 'SVENPY29',
       'SVENPY30', 'SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05',
       'SVENY06', 'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11',
       'SVENY12', 'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17',
       'SVENY18', 'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23',
       'SVENY24', 'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29',
       'SVENY30', 'TAU1', 'TAU2'],
      dtype='object')
In [5]:
# 1. Normalize column names (just in case)
df = df_raw.copy()

# Make sure the date column is correctly named
date_col_candidates = ["Date", "date", "DATE"]
date_col = None
for c in date_col_candidates:
    if c in df.columns:
        date_col = c
        break

if date_col is None:
    raise ValueError(f"Could not find a date column in raw data. Columns: {df.columns}")

df[date_col] = pd.to_datetime(df[date_col])
df = df.sort_values(by=date_col)

# 2. Select zero-coupon columns (SVENYxx)
zc_cols = [c for c in df.columns if c.startswith("SVENY")]
print("Zero-coupon columns:", zc_cols)

if not zc_cols:
    raise ValueError("No zero-coupon (SVENYxx) columns found. Check raw CSV format.")

# 3. Keep Date + zero-coupon columns
df_zc = df[[date_col] + zc_cols].copy()
df_zc = df_zc.rename(columns={date_col: "date"})
df_zc.set_index("date", inplace=True)
df_zc.sort_index(inplace=True)

print("Zero-coupon daily data (raw units):")
df_zc.head()
Zero-coupon columns: ['SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05', 'SVENY06', 'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11', 'SVENY12', 'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17', 'SVENY18', 'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23', 'SVENY24', 'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29', 'SVENY30']
Zero-coupon daily data (raw units):
Out[5]:
SVENY01 SVENY02 SVENY03 SVENY04 SVENY05 SVENY06 SVENY07 SVENY08 SVENY09 SVENY10 ... SVENY21 SVENY22 SVENY23 SVENY24 SVENY25 SVENY26 SVENY27 SVENY28 SVENY29 SVENY30
date
1961-06-14 2.9825 3.3771 3.5530 3.6439 3.6987 3.7351 3.7612 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-06-15 2.9941 3.4137 3.5981 3.6930 3.7501 3.7882 3.8154 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-06-16 3.0012 3.4142 3.5994 3.6953 3.7531 3.7917 3.8192 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-06-19 2.9949 3.4386 3.6252 3.7199 3.7768 3.8147 3.8418 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-06-20 2.9833 3.4101 3.5986 3.6952 3.7533 3.7921 3.8198 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 30 columns

In [6]:
# The SVENYxx convention:
# SVENY01 -> 1-year zero-coupon, SVENY02 -> 2-year, ..., typically up to 30.
# We'll map 'SVENY01' -> 1.0, 'SVENY02' -> 2.0, etc., and rename columns to "1.0","2.0",...

maturities_years = []
for c in zc_cols:
    # x = last two chars -> '01', '02', etc.
    # Some files may have 3 digits if > 99, but for Treasuries we expect <= 30.
    suffix = c.replace("SVENY", "")
    try:
        mat = int(suffix)
    except ValueError:
        raise ValueError(f"Unexpected SVENY column name format: {c}")
    maturities_years.append(mat)

# New column names as string years, e.g. "1.0", "2.0", "3.0", ...
new_cols = [f"{mat:.1f}" for mat in maturities_years]

zc_renaming = dict(zip(zc_cols, new_cols))
df_zc = df_zc.rename(columns=zc_renaming)

# Convert from percent to decimals
df_zc = df_zc.astype(float) / 100.0

print("Zero-coupon daily yields in decimals:")
df_zc.head()
Zero-coupon daily yields in decimals:
Out[6]:
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ... 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0
date
1961-06-14 0.029825 0.033771 0.035530 0.036439 0.036987 0.037351 0.037612 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-06-15 0.029941 0.034137 0.035981 0.036930 0.037501 0.037882 0.038154 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-06-16 0.030012 0.034142 0.035994 0.036953 0.037531 0.037917 0.038192 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-06-19 0.029949 0.034386 0.036252 0.037199 0.037768 0.038147 0.038418 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-06-20 0.029833 0.034101 0.035986 0.036952 0.037533 0.037921 0.038198 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 30 columns

In [7]:
df_zc.to_csv(DAILY_ZC_CSV_PATH, index=True)
In [8]:
# Resample to monthly (end-of-month yields)
df_zc_monthly = df_zc.resample("ME").last()

# Forward-fill any gaps (holidays etc.)
df_zc_monthly = df_zc_monthly.ffill()

# Drop rows that are completely NaN (if any)
df_zc_monthly = df_zc_monthly.dropna(how="all")

print("Monthly zero-coupon yields (decimals):")
df_zc_monthly.head()
Monthly zero-coupon yields (decimals):
Out[8]:
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ... 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0
date
1961-06-30 0.029011 0.032795 0.035036 0.036316 0.037109 0.037640 0.038020 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-07-31 0.027780 0.032304 0.035068 0.036787 0.037907 0.038678 0.039234 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-08-31 0.029863 0.033990 0.036481 0.037919 0.038812 0.039412 0.039841 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-09-30 0.029358 0.033250 0.035412 0.036661 0.037442 0.037968 0.038345 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1961-10-31 0.028936 0.032396 0.034616 0.036087 0.037096 0.037813 0.038339 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 30 columns

In [9]:
df_zc_monthly.to_csv(MONTHLY_ZC_CSV_PATH, index=True)
print(f"Saved monthly zero-coupon panel to: {MONTHLY_ZC_CSV_PATH.resolve()}")
Saved monthly zero-coupon panel to: C:\Users\thoma\Desktop\portfolio projects\P3 - optimal NII hedging\data\gsw_zero_coupon_monthly.csv
In [10]:
# Plot a few maturities to check series look reasonable
sample_mats = ["1.0", "5.0", "10.0", "30.0"]
sample_mats = [m for m in sample_mats if m in df_zc_monthly.columns]

df_zc_monthly[sample_mats].plot(figsize=(10, 5))
plt.title("GSW Zero-Coupon Yields (Monthly)")
plt.xlabel("Date")
plt.ylabel("Yield (decimal)")
plt.grid(True)
plt.show()
No description has been provided for this image

2) DNS (Diebold–Li) factor model: cross-sectional OLS¶

We start with the standard dynamic Nelson–Siegel representation of the yield curve as a function of time to maturity $\tau$:

$ y_t(\tau) = \beta_{1,t}

  • \beta_{2,t}\left(\frac{1-e^{-\lambda \tau}}{\lambda \tau}\right)
  • \beta_{3,t}\left(\frac{1-e^{-\lambda \tau}}{\lambda \tau}-e^{-\lambda \tau}\right)
  • \varepsilon_{t} $

Parameters:

  • $\beta_{1,t}$: level
  • $\beta_{2,t}$: slope
  • $\beta_{3,t}$: curvature
  • $\lambda$: controls the maturity where curvature loads most strongly

At each date $t$, the factors can be estimated by OLS across maturities. This gives a fast, transparent baseline estimate of the latent curve factors.

In [11]:
from statsmodels.tsa.api import VAR
In [12]:
DATA_DIR = Path("data")

df_yields = pd.read_csv(DATA_DIR / "gsw_zero_coupon_monthly.csv",
                        index_col=0, parse_dates=True)

# only get data from 1990 onwards
df_yields = df_yields["1990-01-01":]

print(df_yields.shape)
df_yields.head()
(433, 30)
Out[12]:
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ... 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0
date
1990-01-31 0.080998 0.081567 0.082178 0.082620 0.082924 0.083137 0.083292 0.083409 0.083500 0.083573 ... 0.083916 0.083930 0.083943 0.083955 0.083966 0.083976 0.083985 0.083994 0.084002 0.084010
1990-02-28 0.080925 0.082517 0.083358 0.083810 0.084082 0.084263 0.084392 0.084489 0.084564 0.084624 ... 0.084907 0.084918 0.084929 0.084939 0.084948 0.084956 0.084964 0.084971 0.084977 0.084984
1990-03-31 0.083192 0.084778 0.085420 0.085639 0.085698 0.085701 0.085687 0.085670 0.085654 0.085640 ... 0.085570 0.085567 0.085565 0.085562 0.085560 0.085558 0.085556 0.085554 0.085553 0.085551
1990-04-30 0.085684 0.087799 0.088629 0.088956 0.089093 0.089156 0.089190 0.089210 0.089225 0.089235 ... 0.089284 0.089286 0.089288 0.089290 0.089291 0.089293 0.089294 0.089295 0.089296 0.089297
1990-05-31 0.081363 0.083009 0.084012 0.084568 0.084881 0.085058 0.085157 0.085208 0.085232 0.085239 ... 0.085137 0.085130 0.085123 0.085117 0.085111 0.085106 0.085101 0.085096 0.085092 0.085088

5 rows × 30 columns

In [13]:
def dl_loadings(maturities: np.ndarray, lam: float) -> np.ndarray:
    tau = maturities
    lam_tau = lam * tau

    with np.errstate(divide="ignore", invalid="ignore"):
        f1 = np.ones_like(tau)
        f2 = (1 - np.exp(-lam_tau)) / lam_tau
        f3 = f2 - np.exp(-lam_tau)

    f2 = np.where(tau == 0, 1.0, f2)
    f3 = np.where(tau == 0, 0.0, f3)

    return np.column_stack([f1, f2, f3])
In [14]:
def estimate_diebold_li_factors(df_yields: pd.DataFrame, lam: float = 0.0609):
    maturities = np.array([float(c) for c in df_yields.columns])
    sort_idx = np.argsort(maturities)
    mats_sorted = maturities[sort_idx]

    df_sorted = df_yields.iloc[:, sort_idx]

    X = dl_loadings(mats_sorted, lam)

    betas = []
    dates = []

    for date, row in df_sorted.iterrows():
        y = row.values.astype(float)
        mask = ~np.isnan(y)

        X_m = X[mask]
        y_m = y[mask]

        if y_m.shape[0] < 3:
            continue

        beta_hat = np.linalg.inv(X_m.T @ X_m) @ X_m.T @ y_m
        betas.append(beta_hat)
        dates.append(date)

    return pd.DataFrame(betas, index=dates, columns=["level","slope","curvature"]).sort_index()
In [15]:
lam = 0.0609  # canonical Diebold–Li value
factors_ols = estimate_diebold_li_factors(df_yields, lam)
factors_ols.to_csv(DATA_DIR / "dl_factors_ols.csv")

print(factors_ols.head())
               level     slope  curvature
1990-01-31  0.076733  0.004404   0.016872
1990-02-28  0.075251  0.006550   0.021459
1990-03-31  0.079159  0.005193   0.012616
1990-04-30  0.080233  0.006865   0.018714
1990-05-31  0.072812  0.009628   0.024918

3) Time-series dynamics for the factors¶

In generic fashion, a state-space model consists of two equations:

State (Transition) Equation¶

$ \mathbf{x}_{t+1} = f(\mathbf{x}_t) + \boldsymbol{\varepsilon}_{t+1}, \qquad \boldsymbol{\varepsilon}_{t+1} \sim \mathcal{N}(0, Q) $

Measurement (Observation) Equation¶

$ \mathbf{y}_t = g(\mathbf{x}_t) + \boldsymbol{\eta}_t, \qquad \boldsymbol{\eta}_t \sim \mathcal{N}(0, R) $

To move from “static cross-sectional fits” to a full state-space model, we first need a law of motion for the factors i.e. a transition equation:

$ B_{t+1} = A B_{t} (\tau) + \eta_t, \quad \eta_t \sim \mathcal{N}(0, Q), $

where $B_t = [l_t, s_t, c_t] = [\beta_{1,t}, \beta_{2,t}, \beta_{3,t}]$ correspond to our DNS factors (level, slope and curvature).

A VAR(1) is a natural first choice for the dynamics:

  • flexible enough to capture persistence and cross-factor interactions,
  • still linear-Gaussian (useful for Kalman filtering),
  • aligns with a control-theory / LQ framework later.

The fitted VAR parameters $(A, Q)$ become the state transition in the Kalman filter. We use the VAR module from the statmodels library.

In [16]:
var_model = VAR(factors_ols)
var_res = var_model.fit(maxlags=1)

A = var_res.coefs[0]       # transition matrix
Q = var_res.sigma_u        # state noise covariance

print("Transition matrix A:")
print(A)
print("State noise covariance Q:")
print(Q)
Transition matrix A:
[[ 0.91940097  0.08694316 -0.04380703]
 [ 0.07225808  0.88825823  0.04739625]
 [ 0.1174264  -0.12157507  1.05306669]]
State noise covariance Q:
              level     slope  curvature
level      0.000129 -0.000121  -0.000251
slope     -0.000121  0.000121   0.000233
curvature -0.000251  0.000233   0.000525
C:\Users\thoma\.conda\envs\pymc_env\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency ME will be used.
  self._init_dates(dates, freq)

4) Measurement equation and residual covariance¶

In the measurement equation, yields are linear in the factors (given $\lambda$):

$ Y_t = y_t (\tau) = H(\lambda, \tau) B_t + \varepsilon_t,\quad \varepsilon_t \sim \mathcal{N}(0, R), $

where $H$ contains the factor loadings.

Key practical step: estimate $R$ (measurement noise) using OLS residuals and ensure that:

  • maturities are aligned across dates,
  • missing values are handled consistently,
  • the clean yield matrix $Y$ and factor estimates are dimensionally compatible.

This makes the later state-space computations robust and reproducible.

In [17]:
def estimate_measurement_cov(df_yields, factors_ols, lam):
    # Align by date
    df_y, df_b = df_yields.align(factors_ols, join="inner", axis=0)

    # Coerce to numeric and drop any rows with NaNs
    df_y = df_y.apply(pd.to_numeric, errors="coerce")
    df_b = df_b.apply(pd.to_numeric, errors="coerce")

    row_mask = df_y.notna().all(axis=1) & df_b.notna().all(axis=1)
    df_y = df_y.loc[row_mask]
    df_b = df_b.loc[row_mask]

    print("Measurement cov – using", df_y.shape[0], "dates")

    maturities = np.array([float(c) for c in df_y.columns])
    H = dl_loadings(maturities, lam)  # (n, 3)

    Y = df_y.values   # (T, n)
    B = df_b.values   # (T, 3)

    residuals = []
    for t in range(Y.shape[0]):
        y_t = Y[t, :]        # (n,)
        beta_t = B[t, :]     # (3,)
        y_hat_t = H @ beta_t # (n,)
        e_t = y_t - y_hat_t
        residuals.append(e_t)

    E = np.vstack(residuals)   # (T, n)
    R_full = np.cov(E, rowvar=False)  # cov across maturities

    print("Any NaNs in R_full?", np.isnan(R_full).any())

    sigma2 = float(np.nanmean(np.diag(R_full)))
    n = df_y.shape[1]
    R = sigma2 * np.eye(n)

    return R, H, df_y, df_b
In [18]:
R, H, df_y_clean, df_b_clean = estimate_measurement_cov(df_yields, factors_ols, lam)
print("Shapes: Y", df_y_clean.shape, "H", H.shape, "R", R.shape)
Measurement cov – using 433 dates
Any NaNs in R_full? False
Shapes: Y (433, 30) H (30, 3) R (30, 30)

5) Kalman filtering and smoothing (DNS)¶

The OLS factors treat each date independently. A state-space approach instead combines:

  • cross-sectional information from the yield curve at date $t$,
  • time-series information from the factor dynamics.

The Kalman filter is the core algorithm used to perform inference in linear Gaussian state-space models. In this project, it is used to estimate and infer the latent yield-curve factors (level, slope, curvature) from observed yields.

We compute:

  • Predicted state: $ \beta_{t|t-1} $ (before seeing yields at $t$)
  • Filtered state: $ \beta_{t|t} $ (after incorporating yields at $t$)
  • Smoothed state: $ \beta_{t|T} $ (using the full sample $1..T$)

The Kalman framework distinguishes these three different estimates of the state.

Smoothed factors are especially useful for downstream economic applications because they reduce estimation noise while staying model-consistent.

In [19]:
def kalman_filter_smoother(Y, H, A, Q, R, beta0=None, P0=None):
    Y = np.asarray(Y)
    H = np.asarray(H)
    A = np.asarray(A)
    Q = np.asarray(Q)
    R = np.asarray(R)

    T, n_mats = Y.shape
    n_states = A.shape[0]

    assert H.shape == (n_mats, n_states), f"H shape {H.shape} != ({n_mats},{n_states})"
    assert A.shape == (n_states, n_states)
    assert Q.shape == (n_states, n_states)
    assert R.shape == (n_mats, n_mats)

    beta_pred = np.zeros((T, n_states))
    P_pred = np.zeros((T, n_states, n_states))
    beta_filt = np.zeros((T, n_states))
    P_filt = np.zeros((T, n_states, n_states))

    I = np.eye(n_states)

    if beta0 is None:
        beta0 = np.zeros(n_states)
    if P0 is None:
        P0 = 10.0 * np.eye(n_states)

    beta_prev = beta0
    P_prev = P0

    for t in range(T):
        # Prediction
        beta_t_pred = A @ beta_prev
        P_t_pred = A @ P_prev @ A.T + Q

        # Update
        y_t = Y[t, :]  # (n_mats,)
        S_t = H @ P_t_pred @ H.T + R   # (n_mats, n_mats)
        K_t = P_t_pred @ H.T @ np.linalg.inv(S_t)  # (n_states, n_mats)

        y_hat_t = H @ beta_t_pred     # (n_mats,)
        innov = y_t - y_hat_t         # (n_mats,)

        beta_t_filt = beta_t_pred + K_t @ innov
        P_t_filt = (I - K_t @ H) @ P_t_pred

        beta_pred[t] = beta_t_pred
        P_pred[t] = P_t_pred
        beta_filt[t] = beta_t_filt
        P_filt[t] = P_t_filt

        beta_prev = beta_t_filt
        P_prev = P_t_filt

    # RTS smoother
    beta_smooth = np.zeros_like(beta_filt)
    P_smooth = np.zeros_like(P_filt)

    beta_smooth[-1] = beta_filt[-1]
    P_smooth[-1] = P_filt[-1]

    for t in range(T - 2, -1, -1):
        P_f = P_filt[t]
        P_p_next = P_pred[t + 1]

        C_t = P_f @ A.T @ np.linalg.inv(P_p_next)  # (n_states, n_states)

        beta_smooth[t] = beta_filt[t] + C_t @ (beta_smooth[t + 1] - beta_pred[t + 1])
        P_smooth[t] = P_f + C_t @ (P_smooth[t + 1] - P_p_next) @ C_t.T

    return beta_filt, beta_smooth
In [20]:
# 1) OLS factors
factors_ols = estimate_diebold_li_factors(df_yields, lam)

# 2) A, Q from VAR on OLS factors
var_model = VAR(factors_ols)
var_res = var_model.fit(maxlags=1)
A = var_res.coefs[0]
Q = var_res.sigma_u

# 3) R, H, and cleaned yields/factors
R, H, df_y_clean, df_b_clean = estimate_measurement_cov(df_yields, factors_ols, lam)

print("Shapes before Kalman:")
print("Y:", df_y_clean.shape)
print("H:", H.shape)
print("A:", A.shape)
print("Q:", Q.shape)
print("R:", R.shape)

# 4) Run Kalman + smoother
Y = df_y_clean.values  # (T, n)
beta0 = df_b_clean.iloc[0].values  # first OLS beta as init
P0 = np.eye(3)

beta_filt, beta_smooth = kalman_filter_smoother(Y, H, A, Q, R, beta0, P0)

idx = df_b_clean.index
cols = ["level", "slope", "curvature"]

factors_filt = pd.DataFrame(beta_filt, index=idx, columns=cols)
factors_smooth = pd.DataFrame(beta_smooth, index=idx, columns=cols)
Measurement cov – using 433 dates
Any NaNs in R_full? False
Shapes before Kalman:
Y: (433, 30)
H: (30, 3)
A: (3, 3)
Q: (3, 3)
R: (30, 30)
C:\Users\thoma\.conda\envs\pymc_env\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency ME will be used.
  self._init_dates(dates, freq)

6) Applying the Kalman filter/smoother¶

We initialize the state and covariance and run the filter forward and the RTS smoother backward.

Two sanity checks matter here:

  1. Shapes: $Y$ is (T × N maturities), $H$ is (N × 3), states are (T × 3).
  2. Scale: yields should be in consistent units (e.g., decimals rather than percent) throughout.

The output is a time series of factor estimates with three versions (predicted, filtered, smoothed).

In [21]:
# Initial state: use first OLS estimate as beta0
beta0 = df_b_clean.iloc[0].values.astype(float)  # shape (3,)
P0 = np.eye(3) * 1.0  # initial covariance; you can tweak the scale

# Y is the observation matrix (T, n_mats)
Y = df_y_clean.values.astype(float)

beta_filt, beta_smooth = kalman_filter_smoother(
    Y, H, A, Q, R,
    beta0=beta0,
    P0=P0
)

beta_filt.shape, beta_smooth.shape
Out[21]:
((433, 3), (433, 3))
In [22]:
idx = df_b_clean.index
cols = ["level", "slope", "curvature"]

factors_filt = pd.DataFrame(beta_filt, index=idx, columns=cols)
factors_smooth = pd.DataFrame(beta_smooth, index=idx, columns=cols)

factors_filt.to_csv(DATA_DIR / "dl_factors_kalman_filtered_sample.csv")
factors_smooth.to_csv(DATA_DIR / "dl_factors_kalman_smoothed_sample.csv")

factors_smooth.head()
Out[22]:
level slope curvature
1990-01-31 0.078016 0.003215 0.014766
1990-02-28 0.075872 0.006037 0.020238
1990-03-31 0.079267 0.005030 0.012670
1990-04-30 0.077276 0.009562 0.023612
1990-05-31 0.072786 0.009663 0.024977

7) OLS vs Kalman factors (what should we expect?)¶

OLS and Kalman-smoothed factors can look very close in benign settings because:

  • the Nelson–Siegel cross-section is already very informative,
  • the VAR dynamics mostly provide gentle time-series regularization.

The real value of state-space estimation shows up when:

  • measurement noise is material,
  • missing observations occur,
  • we extend the model (e.g., AFNS adjustment),
  • we need probabilistic filtering objects (pred/filtered/smoothed) for decision-making.

This is a necessary stepping stone to AFNS and to dynamic hedging.

In [23]:
plt.figure(figsize=(10,4))
plt.plot(df_b_clean["level"], label="OLS", alpha=0.6)
plt.plot(factors_smooth["level"], label="Kalman smooth", alpha=0.8)
plt.title("Level factor – OLS vs Kalman")
plt.legend()
plt.grid(True)
plt.show()

plt.figure(figsize=(10,4))
plt.plot(df_b_clean["slope"], label="OLS", alpha=0.6)
plt.plot(factors_smooth["slope"], label="Kalman smooth", alpha=0.8)
plt.title("Slope factor – OLS vs Kalman")
plt.legend()
plt.grid(True)
plt.show()

plt.figure(figsize=(10,4))
plt.plot(df_b_clean["curvature"], label="OLS", alpha=0.6)
plt.plot(factors_smooth["curvature"], label="Kalman smooth", alpha=0.8)
plt.title("Curvature factor – OLS vs Kalman")
plt.legend()
plt.grid(True)
plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

8) Fully latent state-space estimation via MLE¶

Instead of treating OLS + VAR as “two-step”, we can estimate a coherent state-space model by maximum likelihood:

  • Transition parameters: $A, Q$
  • Measurement parameters: $R$ (and potentially $\lambda$ in some variants)

This step is closer to how term structure models are often estimated in practice: choose parameters to maximize the likelihood implied by the Kalman filter.

We start with sensible initial values and then optimize the negative log-likelihood.

In [24]:
def unpack_params(theta):
    """
    Map parameter vector theta (length 16) -> (A, Q, R_scalar).
    A: 3x3
    Q: 3x3 (PSD via Cholesky L L')
    R: scalar variance (we'll build R = R * I outside)
    """
    theta = np.asarray(theta)
    assert theta.size == 16

    # A entries
    A_flat = theta[0:9]
    A = A_flat.reshape(3, 3)

    # Cholesky L parameters for Q
    l11, l21, l22, l31, l32, l33 = theta[9:15]

    L = np.array([
        [np.exp(l11),     0.0,          0.0],
        [l21,         np.exp(l22),      0.0],
        [l31,             l32,      np.exp(l33)]
    ])
    Q = L @ L.T

    # Measurement variance
    log_sigma = theta[15]
    sigma = np.exp(log_sigma)
    R_scalar = sigma**2

    return A, Q, R_scalar
In [25]:
def kalman_loglik(theta, Y, H, beta0=None, P0=None):
    """
    Negative log-likelihood for given theta, using Kalman filter.

    theta: parameter vector (length 16)
    Y: (T, n) array of yields
    H: (n, 3) loadings matrix
    beta0: initial state mean (3,)
    P0: initial state covariance (3,3)

    Returns: negative log-likelihood (float)
    """
    Y = np.asarray(Y)
    T, n = Y.shape

    A, Q, R_scalar = unpack_params(theta)
    R = R_scalar * np.eye(n)

    k = 3
    if beta0 is None:
        beta0 = np.zeros(k)
    if P0 is None:
        P0 = 10.0 * np.eye(k)

    beta_prev = beta0
    P_prev = P0

    I_k = np.eye(k)

    loglik = 0.0
    const = n * np.log(2 * np.pi)

    for t in range(T):
        # Prediction
        beta_pred = A @ beta_prev
        P_pred = A @ P_prev @ A.T + Q

        y_t = Y[t, :]  # (n,)

        # Innovation
        S_t = H @ P_pred @ H.T + R  # (n,n)
        try:
            S_inv = np.linalg.inv(S_t)
            sign, logdet = np.linalg.slogdet(S_t)
            if sign <= 0:
                # Penalize non-PD S_t
                return 1e6
        except np.linalg.LinAlgError:
            return 1e6

        y_hat = H @ beta_pred
        innov = y_t - y_hat  # (n,)

        # Contribution to log-likelihood
        quad = innov.T @ S_inv @ innov
        loglik_t = -0.5 * (const + logdet + quad)
        loglik += loglik_t

        # Kalman update (for next step)
        K_t = P_pred @ H.T @ S_inv  # (3,n)
        beta_filt = beta_pred + K_t @ innov
        P_filt = (I_k - K_t @ H) @ P_pred

        beta_prev, P_prev = beta_filt, P_filt

    # We return negative log-likelihood for minimization
    return -float(loglik)
In [26]:
def cholesky_param_from_Q(Q):
    """
    Take a 3x3 PSD Q and get Cholesky parameter vector
    (l11, l21, l22, l31, l32, l33)
    such that L L' = Q and L has exp(diag) structure.
    """
    L0 = np.linalg.cholesky(Q)
    # enforce positive diag via exp parameterization
    l11 = np.log(L0[0,0])
    l22 = np.log(L0[1,1])
    l33 = np.log(L0[2,2])

    # off-diagonals stay as is
    l21 = L0[1,0]
    l31 = L0[2,0]
    l32 = L0[2,1]

    return np.array([l11, l21, l22, l31, l32, l33])

def initial_theta_from_2step(A_init, Q_init, R_sigma2_init):
    A_flat = A_init.flatten()
    l_params = cholesky_param_from_Q(Q_init)
    log_sigma0 = 0.5 * np.log(R_sigma2_init)
    theta0 = np.concatenate([A_flat, l_params, np.array([log_sigma0])])
    return theta0

9) Initialization choices for MLE¶

State-space likelihood optimization is sensitive to starting points.

We use:

  • a stable initial $A$ (eigenvalues inside the unit circle),
  • small but non-zero $Q$ to allow realistic factor innovations,
  • diagonal $R$ as a parsimonious first approximation to measurement noise.

The goal is not “perfect initialization”, but a starting point that avoids numerical pathologies and lets the optimizer find a plausible region of parameter space.

In [27]:
# 1. Initial A, Q, R
A_init = np.array([
    [0.98,  0.01,  0.00],
    [0.00,  0.90,  0.05],
    [0.00, -0.05,  0.80]
])

Q_init = np.array([
    [0.0005, 0.0,     0.0],
    [0.0,    0.0010,  0.0],
    [0.0,    0.0,     0.0010]
])

R_sigma2_init = 1e-4

# 2. Turn Q into Cholesky parameters
l_params = cholesky_param_from_Q(Q_init)

# 3. Flatten A and make log-sigma
A_flat = A_init.flatten()
log_sigma0 = 0.5 * np.log(R_sigma2_init)

# 4. Full initial vector (length 16)
theta0 = np.concatenate([
    A_flat,
    l_params,
    np.array([log_sigma0])
])

print(theta0)
[ 0.98        0.01        0.          0.          0.9         0.05
  0.         -0.05        0.8        -3.80045123  0.         -3.45387764
  0.          0.         -3.45387764 -4.60517019]

10) Maximum likelihood estimation¶

We optimize the negative log-likelihood produced by the Kalman filter.

Practical notes:

  • we typically enforce stability / positivity constraints implicitly (e.g., parameterizing variances in log-space),
  • we monitor convergence and sanity-check parameter magnitudes,
  • the end product is a set of parameters that make the observed yield panel most likely under the model.
In [28]:
from scipy.optimize import minimize

# Y, H from cleaned dataset and loadings
Y = df_y_clean.values.astype(float)
maturities = np.array([float(c) for c in df_y_clean.columns])
H = dl_loadings(maturities, lam)  # (n,3)

beta0 = df_b_clean.iloc[0].values  # or zeros
P0 = np.eye(3) * 1.0

def objective(theta):
    return kalman_loglik(theta, Y, H, beta0=beta0, P0=P0)

res = minimize(
    objective,
    theta0,
    method="L-BFGS-B",
    options={"maxiter": 200, "disp": True}
)

print("Converged:", res.success)
print("Final neg loglik:", res.fun)
Converged: True
Final neg loglik: -65595.06096658931

11) Estimated parameters and interpretation¶

After optimization, we extract:

  • $A$: persistence and cross-factor transmission
  • $Q$: variance of factor shocks (state noise)
  • $R$: measurement noise by maturity (observation noise)

A useful sanity check is that:

  • $A$ implies persistent but stable factors,
  • $Q$ is not degenerate (not all zeros),
  • $R$ does not explode for specific maturities (unless data quality demands it).
In [29]:
theta_hat = res.x
A_hat, Q_hat, R_scalar_hat = unpack_params(theta_hat)
R_hat = R_scalar_hat * np.eye(Y.shape[1])

12) Smoothed factors under MLE¶

With the MLE parameters fixed, we re-run the Kalman filter and smoother to produce the final factor estimates. We also estimate $\lambda$ through MLE.

These smoothed factors are the state variables we will carry forward into:

  • AFNS (no-arbitrage yield adjustment),
  • balance sheet / NII simulation,
  • control and RL environments.

From this point, the modeling focus shifts from “fit the curve” to “use the curve as a state in a decision problem”.

In [30]:
beta_filt_mle, beta_smooth_mle = kalman_filter_smoother(
    Y, H, A_hat, Q_hat, R_hat,
    beta0=beta0,
    P0=P0
)

factors_smooth_mle = pd.DataFrame(
    beta_smooth_mle,
    index=df_y_clean.index,
    columns=["level","slope","curvature"]
)
In [31]:
def mle_given_lambda(lam, Y, maturities, theta0):
    H = dl_loadings(maturities, lam)

    def objective(theta):
        return kalman_loglik(theta, Y, H, beta0=beta0, P0=P0)

    res = minimize(objective, theta0, method="L-BFGS-B",
                   options={"maxiter": 200})
    return res.fun, res.x  # neg loglik, theta_hat
In [32]:
Y = df_y_clean.values
maturities = np.array([float(c) for c in df_y_clean.columns])

lambda_grid = np.linspace(0.01, 1.2, 30)  # adjust as you like

best_val = np.inf
best_lam = None
best_theta = None

for lam_try in lambda_grid:
    neg_ll, theta_hat = mle_given_lambda(lam_try, Y, maturities, theta0)
    print(lam_try, neg_ll)
    if neg_ll < best_val:
        best_val = neg_ll
        best_lam = lam_try
        best_theta = theta_hat

print("Best lambda:", best_lam, "neg loglik:", best_val)

# Summary of all ML estimates we will carry forward
A_hat, Q_hat, R_scalar_hat = unpack_params(best_theta)
R_hat = R_scalar_hat * np.eye(Y.shape[1])
H_hat = dl_loadings(maturities, best_lam)
0.01 -58523.40386389547
0.05103448275862069 -68235.44496930015
0.09206896551724138 -69411.19470853177
0.13310344827586207 -69003.10620696761
0.17413793103448277 -68297.39647498455
0.21517241379310348 -68300.57319024569
0.25620689655172413 -68504.16957735507
0.29724137931034483 -67714.65179067253
0.33827586206896554 -67569.73645112943
0.37931034482758624 -67542.85889860686
0.42034482758620695 -67685.70734203488
0.4613793103448276 -67438.99810142393
0.5024137931034482 -66801.69443086201
0.543448275862069 -66676.98152542647
0.5844827586206897 -67163.57699394242
0.6255172413793104 -67431.45961477139
0.6665517241379311 -67198.9383200503
0.7075862068965517 -65940.00632280785
0.7486206896551725 -66564.35026607805
0.7896551724137931 -66795.19893583513
0.8306896551724139 -66349.21308869353
0.8717241379310345 -66461.79086466972
0.9127586206896552 -64896.40307781075
0.953793103448276 -64560.713959196124
0.9948275862068966 -64161.10854262734
1.0358620689655174 -65162.95527895225
1.076896551724138 -63435.62354720741
1.1179310344827587 -63281.93443362484
1.1589655172413793 -64318.54262469717
1.2 -63397.063454094154
Best lambda: 0.09206896551724138 neg loglik: -69411.19470853177

Part II — AFNS (Arbitrage-Free Nelson–Siegel)¶

From Diebold–Li to AFNS: No-Arbitrage Term Structure Modeling¶

This project models the yield curve using the Arbitrage-Free Nelson–Siegel (AFNS) framework originally introduced by Christensen, Diebold, and Rudebusch (2009). The AFNS model builds directly on the Diebold–Li dynamic Nelson–Siegel (DNS) model, enhancing it with no-arbitrage restrictions.


The Diebold–Li (Dynamic Nelson–Siegel) Model¶

The Diebold–Li model represents the zero-coupon yield curve at time $t$ as a linear function of three latent factors:

$ y_t(\tau)¶

L_t + S_t \frac{1 - e^{-\lambda \tau}}{\lambda \tau} + C_t \left( \frac{1 - e^{-\lambda \tau}}{\lambda \tau}¶

e^{-\lambda \tau} \right) $

where:

  • $L_t$ is the level factor,
  • $S_t$ is the slope factor,
  • $C_t$ is the curvature factor,
  • $\lambda$ controls factor loadings across maturities.

The factors evolve dynamically, typically as a VAR(1):

$ \mathbf{X}_{t+1}¶

\boldsymbol{\mu} + \Phi (\mathbf{X}t - \boldsymbol{\mu}) + \boldsymbol{\varepsilon}{t+1} $

The Diebold–Li model is:

  • parsimonious,
  • empirically successful,
  • and highly interpretable.

However, it is purely statistical.


The Key Limitation: Lack of No-Arbitrage¶

The Diebold–Li model does not impose no-arbitrage restrictions.

This has important consequences:

  • The model fits yields well, but
  • It does not guarantee that yields are consistent with the existence of an underlying stochastic discount factor,
  • It cannot be used coherently for pricing interest-rate-sensitive instruments.

In particular, nothing in the Diebold–Li model ensures that yields at different maturities are linked through arbitrage-free pricing relations.

This is acceptable for forecasting, but may pose problems for applications involving hedging, valuation, and balance-sheet risk. With the imposition of no arbitrage, we insure consistency between forward rates and offset exposures correctly when hedging.


Risk-Neutral Pricing and No-Arbitrage¶

In arbitrage-free term-structure models, bond prices are expectations under a risk-neutral probability measure $\mathbb{Q}$:

$ P_t(\tau)¶

\mathbb{E}^\mathbb{Q}_t \left[ \exp\left(

  • \int_t^{t+\tau} r_s , ds \right) \right] $

where:

  • $r_t$ is the instantaneous short rate,
  • risk premia are absorbed into the change of measure from the physical $\mathbb{P}$ to the risk-neutral $\mathbb{Q}$ measure.

In affine term-structure models, this leads to yields of the form:

$ y_t(\tau)¶

A(\tau) + B(\tau)^\top \mathbf{X}_t $

with $A(\tau)$ and $B(\tau)$ determined by:

  • the dynamics of $\mathbf{X}_t$ under $\mathbb{Q}$,
  • and the specification of the short rate.

This structure enforces internal consistency across maturities.


The AFNS Model: Making Nelson–Siegel Arbitrage-Free¶

The AFNS model preserves the Nelson–Siegel factor structure while embedding it into an affine no-arbitrage framework.

Short Rate Specification¶

The short rate is defined as a linear function of the Nelson–Siegel factors:

$ r_t = L_t + S_t $

This choice preserves the economic interpretation of the level and slope factors.


Risk-Neutral Dynamics¶

Under the risk-neutral measure $\mathbb{Q}$, the factors follow affine Gaussian dynamics:

$ d\mathbf{X}_t¶

K_\mathbb{Q} (\theta_\mathbb{Q} - \mathbf{X}_t) , dt + \Sigma , d\mathbf{W}^\mathbb{Q}_t $

These continuous-time dynamics imply closed-form expressions for bond prices and yields.


The Yield Adjustment Term¶

The key difference between DNS and AFNS lies in the yield adjustment term.

Observed yields satisfy: $ y_t(\tau)¶

A(\tau) + B(\tau)^\top \mathbf{X}_t + \eta_t $

where:

  • $B(\tau)$ has the same Nelson–Siegel loadings as in Diebold–Li,
  • $A(\tau)$ is a maturity-dependent adjustment term.

This adjustment term:

  • depends on the factor volatilities,
  • captures Jensen’s inequality effects from stochastic discounting,
  • and ensures that yields satisfy no-arbitrage restrictions.

Importantly:

AFNS does not change the factor loadings. It changes the intercept.

This preserves interpretability while enforcing arbitrage-free pricing.


Relationship Between DNS and AFNS¶

The AFNS model can be viewed as:

Diebold–Li + a model-consistent yield adjustment term

Key implications:

  • DNS is recovered as a special case when volatilities vanish,
  • AFNS remains empirically flexible,
  • AFNS supports pricing, hedging, and risk-neutral valuation.

Thus, AFNS is a structural refinement, not a competing model.


Why AFNS is important in this project¶

This project studies:

  • interest-rate risk in the banking book,
  • dynamic hedging with interest-rate derivatives,
  • and optimal decision-making under uncertainty.

These tasks require:

  • consistent pricing across maturities,
  • coherent forward-rate dynamics,
  • and economically meaningful hedge payoffs.

AFNS provides:

  • a no-arbitrage state-space representation of the yield curve,
  • compatibility with Kalman filtering and smoothing,
  • and a principled foundation for both LQ control and reinforcement learning.

Conceptual Summary¶

  • Diebold–Li offers a flexible statistical representation of the yield curve.
  • AFNS embeds this representation into an affine no-arbitrage framework.
  • The adjustment term $A(\tau)$ enforces pricing consistency without sacrificing interpretability.
  • This makes AFNS the natural choice for applications that bridge econometrics, pricing, and dynamic hedging.

Reference: Christensen, Diebold, and Rudebusch (2009), “The Affine Arbitrage-Free Class of Nelson–Siegel Term Structure Models.”

AFNS approach and implementation used here¶

We implement AFNS as an extension on top of our DNS implementation:

  • Keep factor dynamics under the physical measure $\mathbb{P}$ (estimated from data).
  • Modify the measurement equation by adding the AFNS adjustment term.

This is a pragmatic “best of both worlds” approach:

  • retains the DNS interpretability and estimation pipeline,
  • introduces no-arbitrage consistency in yield construction.

Also be careful not to confuse $A$, the matrix of DNS VAR coefficients, with $A(\tau)$, the AFNS no-arbitrage adjustment term

In [33]:
from scipy.linalg import logm

def compute_K_from_A(A, delta_t=1/12):
    """
    Given discrete-time A (3x3) and time step delta_t in years (monthly = 1/12),
    approximate continuous-time K via matrix logarithm.
    """
    A = np.asarray(A)
    K = - (1.0 / delta_t) * logm(A)  # <-- minus sign here
    K = np.real_if_close(K)
    return K

def afns_AB_grid(K, Q, delta0, delta1, theta, tau_grid, n_steps=200):
    """
    Compute A(tau), B(tau) on a grid of maturities tau_grid (in years)
    for an AFNS-like model with:
      dX_t = K (theta - X_t) dt + noise
      r_t = delta0 + delta1' X_t

    Uses simple Euler integration of the ODEs:
      dB/dtau = -K' B - delta1
      dA/dtau = -delta0 - (K theta)' B + 0.5 B' Q B

    K: (3,3)
    Q: (3,3) continuous-time state covariance
    delta0: scalar
    delta1: (3,) vector
    theta: (3,) long-run mean
    tau_grid: array of maturities in years
    n_steps: steps per year for numerical integration
    """
    K = np.asarray(K)
    Q = np.asarray(Q)
    delta1 = np.asarray(delta1).reshape(3,)
    theta = np.asarray(theta).reshape(3,)

    tau_grid = np.asarray(tau_grid)
    taus_sorted = np.sort(tau_grid)
    max_tau = taus_sorted[-1]

    dtau = 1.0 / n_steps  # step in years
    n_iter = int(max_tau / dtau) + 1

    B = np.zeros((3,))  # B(0)
    A = 0.0             # A(0)

    A_vals = {}
    B_vals = {}

    current_tau = 0.0
    idx_tau = 0

    K_T = K.T
    Ktheta = K @ theta

    for i in range(n_iter):
        # store values when we cross a tau in tau_grid
        while idx_tau < len(taus_sorted) and current_tau >= taus_sorted[idx_tau] - 1e-8:
            tau_val = taus_sorted[idx_tau]
            A_vals[tau_val] = A
            B_vals[tau_val] = B.copy()
            idx_tau += 1
            if idx_tau >= len(taus_sorted):
                break

        if idx_tau >= len(taus_sorted):
            break

        # ODEs:
        dB = -(K_T @ B) - delta1
        dA = -delta0 - (Ktheta @ B) + 0.5 * (B @ Q @ B)

        B = B + dB * dtau
        A = A + dA * dtau

        current_tau += dtau

    # Convert dicts to arrays aligned with original tau_grid order
    A_array = np.array([A_vals[tau] for tau in tau_grid])
    B_array = np.vstack([B_vals[tau] for tau in tau_grid])  # (n_tau, 3)

    return A_array, B_array
In [34]:
class AFNSFromDL:
    def __init__(self, A_P, Q_P, factors_df, maturities, delta0=None, delta1=None, delta_t=1/12):
        """
        A_P, Q_P: discrete-time DL MLE dynamics (3x3 each)
        factors_df: DataFrame with columns ['level','slope','curvature']
        maturities: array-like of maturities in years (e.g. [1.0, 2.0, ..., 30.0])
        delta0: scalar for short-rate intercept (if None, set to 0)
        delta1: length-3 array (if None, default [1,1,0])
        delta_t: time step in years for A_P (monthly = 1/12)
        """
        self.A_P = np.asarray(A_P)
        self.Q_P = np.asarray(Q_P)
        self.factors = factors_df
        self.maturities = np.asarray(maturities)
        self.delta_t = delta_t

        self.K = compute_K_from_A(self.A_P, delta_t=delta_t)
        # crude continuous-time Q: scale discrete Q by 1/delta_t
        self.Q_ct = self.Q_P * delta_t

        if delta1 is None:
            self.delta1 = np.array([1.0, 1.0, 0.0])
        else:
            self.delta1 = np.asarray(delta1).reshape(3,)

        if delta0 is None:
            self.delta0 = 0.0
        else:
            self.delta0 = float(delta0)

        # long-run mean theta: sample mean of factors
        self.theta = self.factors[["level","slope","curvature"]].mean().values

        # precompute A(tau), B(tau) and build linear mapping
        self.A_tau, self.B_tau = afns_AB_grid(
            self.K, self.Q_ct, self.delta0, self.delta1, self.theta,
            tau_grid=self.maturities
        )
        # Mapping: y_t(tau_i) = a_i + M_i dot X_t
        # with a_i = -A(tau_i)/tau_i, M_i = -B(tau_i)/tau_i
        self.a_vec = -self.A_tau / self.maturities
        self.M_mat = -self.B_tau / self.maturities[:, None]  # (n_tau, 3)

    def yields_from_factors(self, X_t):
        """
        Given X_t = [level, slope, curvature], return AFNS zero-coupon yields
        at all self.maturities.
        """
        X_t = np.asarray(X_t).reshape(3,)
        return self.a_vec + self.M_mat @ X_t

    def yields_from_path(self):
        """
        Apply AFNS mapping to the whole factor path.
        Returns DataFrame: index like factors_df, columns = maturities as strings.
        """
        X = self.factors[["level","slope","curvature"]].values  # (T,3)
        Y = X @ self.M_mat.T + self.a_vec  # (T, n_tau)
        cols = [f"{m:.1f}" for m in self.maturities]
        return pd.DataFrame(Y, index=self.factors.index, columns=cols)
In [35]:
# Suppose you have:
# A_hat, Q_hat from DL MLE
# factors_smooth_mle: DataFrame with level/slope/curvature
# maturities: e.g. np.array([1.0, 2.0, 3.0, 5.0, 7.0, 10.0, 20.0, 30.0])

afns_model = AFNSFromDL(
    A_P=A_hat,
    Q_P=Q_hat,
    factors_df=factors_smooth_mle,
    maturities=np.array([1.0, 2.0, 3.0, 5.0, 7.0, 10.0, 20.0, 30.0])
)

afns_yields = afns_model.yields_from_path()
afns_yields.head()
Out[35]:
1.0 2.0 3.0 5.0 7.0 10.0 20.0 30.0
date
1990-01-31 0.098342 0.099302 0.089368 0.046221 -0.023844 -0.177594 -1.126377 -2.799207
1990-02-28 0.099142 0.100048 0.090092 0.046931 -0.023138 -0.176886 -1.125657 -2.798470
1990-03-31 0.101035 0.101805 0.091783 0.048563 -0.021530 -0.175293 -1.124059 -2.796845
1990-04-30 0.105012 0.106307 0.096513 0.053477 -0.016537 -0.170230 -1.118841 -2.791492
1990-05-31 0.100435 0.101578 0.091719 0.048629 -0.021414 -0.175142 -1.123866 -2.796636

AFNS approach used here¶

We implement AFNS as an extension on top of DNS:

  • Keep factor dynamics under the physical measure $\mathbb{P}$ (estimated from data).
  • Modify the measurement equation by adding the AFNS adjustment term.

This is a pragmatic “best of both worlds” approach:

  • retains the DNS interpretability and estimation pipeline,
  • introduces no-arbitrage consistency in yield construction.
In [36]:
def ns_loadings(tau, lam):
    """
    Nelson–Siegel factor loadings:
    B1(tau) = 1
    B2(tau) = (1 - exp(-lam*tau)) / (lam*tau)
    B3(tau) = B2(tau) - exp(-lam*tau)
    """
    tau = np.asarray(tau, dtype=float)
    eps = 1e-8
    x = lam * np.maximum(tau, eps)
    exp_term = np.exp(-x)

    B1 = np.ones_like(tau)
    B2 = (1.0 - exp_term) / x
    B3 = B2 - exp_term
    return B1, B2, B3
In [37]:
def afns_yield_adjustment(tau, lam, sig1, sig2, sig3):
    """
    Independent-factor AFNS yield-adjustment term:
    C(t,T)/(T-t) as function of tau, lam, sigma1..3.
    """
    tau = np.asarray(tau, dtype=float)
    lam = float(lam)

    s1, s2, s3 = float(sig1), float(sig2), float(sig3)

    eps = 1e-8
    tau_safe = np.maximum(tau, eps)
    x = lam * tau_safe
    e1 = np.exp(-x)
    e2 = np.exp(-2 * x)

    # Level factor contribution
    I1 = (s1**2 / 6.0) * tau_safe**2

    # Slope factor contribution
    I2 = s2**2 * (
        1.0 / (2.0 * lam**2)
        - (1.0 / lam**3) * (1.0 - e1) / tau_safe
        + (1.0 / (4.0 * lam**3)) * (1.0 - e2) / tau_safe
    )

    # Curvature factor contribution
    I3 = s3**2 * (
        1.0 / (2.0 * lam**2)
        + (1.0 / lam**2) * e1
        - (1.0 / (4.0 * lam)) * tau_safe * e2
        - (3.0 / (4.0 * lam**2)) * e2
        - (2.0 / lam**3) * (1.0 - e1) / tau_safe
        + (5.0 / (8.0 * lam**3)) * (1.0 - e2) / tau_safe
    )

    return I1 + I2 + I3
In [38]:
def build_measurement_matrices(taus, lam, sig1, sig2, sig3):
    """
    Build:
    - H: N x 3 factor loading matrix
    - a: N-dimensional intercept vector (AFNS adj term)
    used in y_t = a + H X_t + eps_t
    """
    taus = np.asarray(taus, dtype=float)

    B1, B2, B3 = ns_loadings(taus, lam)
    H = np.column_stack([B1, B2, B3])

    C_adj = afns_yield_adjustment(taus, lam, sig1, sig2, sig3)
    a = -C_adj

    return H, a

Utility functions: DNS yields and parameter vector¶

We keep helper functions for:

  • generating DNS yields from factors (baseline reference),
  • unpacking the parameter vector $\theta$ used in optimization.

Packing parameters into a vector is standard for numerical optimization; unpacking makes the model readable and reduces bugs when mapping parameters to matrices $(\Phi, Q, R$, etc.).

In [39]:
def dns_yields_from_factors(X_t, taus, lam):
    """
    Produce DNS/Nelson–Siegel yields from factor vector X_t=[L,S,C]
    """
    X_t = np.asarray(X_t, dtype=float)
    L_t, S_t, C_t = X_t
    B1, B2, B3 = ns_loadings(taus, lam)
    return L_t * B1 + S_t * B2 + C_t * B3


def afns_yields_from_factors(X_t, taus, lam, sig1, sig2, sig3):
    """
    AFNS yields = DNS yields - no-arbitrage adjustment
    """
    dns = dns_yields_from_factors(X_t, taus, lam)
    adj = afns_yield_adjustment(taus, lam, sig1, sig2, sig3)
    return dns - adj
In [40]:
def unpack_theta(theta):
    """
    Unpack the 14-parameter AFNS reduced-form vector.
    """
    theta = np.asarray(theta, dtype=float)

    phi_L, phi_S, phi_C = theta[0:3]
    mu_L,  mu_S,  mu_C  = theta[3:6]
    log_qL, log_qS, log_qC = theta[6:9]
    log_lam = theta[9]
    log_sig1, log_sig2, log_sig3 = theta[10:13]
    log_r = theta[13]

    Phi = np.diag([phi_L, phi_S, phi_C])
    mu  = np.array([mu_L, mu_S, mu_C])
    Q   = np.diag([np.exp(log_qL)**2,
                   np.exp(log_qS)**2,
                   np.exp(log_qC)**2])

    lam = np.exp(log_lam)
    sig1, sig2, sig3 = np.exp(log_sig1), np.exp(log_sig2), np.exp(log_sig3)
    r = np.exp(log_r)

    return Phi, mu, Q, lam, sig1, sig2, sig3, r

CurveModelAFNS: reusable curve model object¶

CurveModelAFNS is the “model object” used downstream.

Responsibilities:

  • store estimated parameters,
  • simulate factor paths under $\mathbb{P}$-dynamics,
  • transform factors into yields under DNS or AFNS.

This is useful because later sections (NII, LQ, RL) can treat the term structure model as a black box that produces:

  • state variables (factors),
  • and market observables (yields/forwards).
In [41]:
class CurveModelAFNS:
    """
    DNS/AFNS term structure model with independent AR(1) P-dynamics
    and AFNS no-arbitrage adjustment in measurement eq.
    """

    def __init__(self, theta_hat, taus):
        """
        theta_hat: estimated 14-parameter vector
        taus: maturities in years (array-like)
        """
        self.theta = np.asarray(theta_hat, dtype=float)
        self.taus = np.asarray(taus, dtype=float)

        (
            self.Phi,
            self.mu,
            self.Q,
            self.lam,
            self.sig1,
            self.sig2,
            self.sig3,
            self.r,
        ) = unpack_theta(self.theta)

        # Build AFNS measurement structures
        self.H, self.a = build_measurement_matrices(
            self.taus, self.lam, self.sig1, self.sig2, self.sig3
        )
        self.R = (self.r**2) * np.eye(len(self.taus))

    # ---------- simulators ----------

    def simulate_factors(self, T, x0=None, rng=None):
        """
        Simulate factor path X_t under P-dynamics (AR(1)) for t=0..T-1.
        Returns array (T, 3).
        """
        if rng is None:
            rng = np.random.default_rng()

        if x0 is None:
            x0 = self.mu.copy()

        X = np.zeros((T, 3))
        X[0] = x0

        for t in range(1, T):
            eps = rng.multivariate_normal(mean=np.zeros(3), cov=self.Q)
            X[t] = self.mu + self.Phi @ (X[t-1] - self.mu) + eps

        return X

    def simulate_yields(self, X, model="afns"):
        """
        Given factor path X (T,3), return yields (T, N) under DNS or AFNS.
        """
        X = np.asarray(X, dtype=float)
        T = X.shape[0]
        N = len(self.taus)
        Y = np.zeros((T, N))

        for t in range(T):
            if model == "dns":
                Y[t] = dns_yields_from_factors(X[t], self.taus, self.lam)
            elif model == "afns":
                Y[t] = afns_yields_from_factors(
                    X[t], self.taus, self.lam, self.sig1, self.sig2, self.sig3
                )
            else:
                raise ValueError("model must be 'dns' or 'afns'")

        return Y

AFNS estimation step¶

We estimate AFNS parameters by maximizing the likelihood of the yield panel under the AFNS state-space model.

Compared to DNS:

  • the observation equation includes the AFNS adjustment,
  • additional parameters govern the adjustment term (volatility-related).

The output is a single parameter vector $\theta$ that defines both:

  • the factor dynamics,
  • and the measurement mapping implied by no-arbitrage.
In [42]:
def kalman_loglik_afns(theta, Y, taus, beta0=None, P0=None):
    """
    AFNS Kalman log-likelihood.

    Model:
      X_t - mu = Phi (X_{t-1} - mu) + eta_t,  eta_t ~ N(0, Q)
      y_t = a + H X_t + eps_t,                 eps_t ~ N(0, R)

    where (Phi, mu, Q, lam, sig1..3, r) = unpack_theta(theta)
    and H, a are built via AFNS (no-arbitrage adjustment).

    Parameters
    ----------
    theta : array-like, shape (14,)
        Parameter vector as defined in unpack_theta.
    Y : array-like, shape (T, N)
        Observed yields (T time points, N maturities).
    taus : array-like, shape (N,)
        Maturities in years corresponding to columns of Y.
    beta0 : array-like, shape (3,), optional
        Initial state mean; if None, we use mu.
    P0 : array-like, shape (3, 3), optional
        Initial state covariance; if None, we use 0.1 * I.

    Returns
    -------
    neg_loglik : float
        Negative log-likelihood (for minimization).
    """
    Y = np.asarray(Y, dtype=float)
    T, N = Y.shape
    taus = np.asarray(taus, dtype=float)

    # Unpack parameters
    Phi, mu, Q, lam, sig1, sig2, sig3, r = unpack_theta(theta)

    # Measurement matrices (AFNS)
    H, a = build_measurement_matrices(taus, lam, sig1, sig2, sig3)
    R = (r**2) * np.eye(N)

    # Adjust observations to absorb intercept: y'_t = y_t - a
    Y_adj = Y - a[None, :]

    k = 3  # number of factors
    if beta0 is None:
        beta0 = mu.copy()
    if P0 is None:
        P0 = 0.1 * np.eye(k)

    beta_prev = beta0
    P_prev = P0
    I_k = np.eye(k)

    loglik = 0.0
    const = N * np.log(2 * np.pi)

    for t in range(T):
        # Prediction step: X_t|t-1
        beta_pred = mu + Phi @ (beta_prev - mu)
        P_pred = Phi @ P_prev @ Phi.T + Q

        # Observation for this time
        y_t = Y_adj[t, :]          # (N,)

        # Innovation covariance
        S_t = H @ P_pred @ H.T + R  # (N, N)

        try:
            S_inv = np.linalg.inv(S_t)
            sign, logdet = np.linalg.slogdet(S_t)
            if sign <= 0:
                # non-PD covariance → penalize
                return 1e6
        except np.linalg.LinAlgError:
            return 1e6

        # Innovation
        y_hat = H @ beta_pred      # (N,)
        innov = y_t - y_hat        # (N,)

        quad = innov.T @ S_inv @ innov
        loglik_t = -0.5 * (const + logdet + quad)
        loglik += loglik_t

        # Update step
        K_t = P_pred @ H.T @ S_inv      # (3, N)
        beta_filt = beta_pred + K_t @ innov
        P_filt = (I_k - K_t @ H) @ P_pred

        beta_prev, P_prev = beta_filt, P_filt

    # Return negative log-likelihood for minimization
    return -float(loglik)
In [43]:
def make_initial_theta(Y, taus):
    """
    Construct a rough initial guess for the 14-parameter AFNS vector.
    Y: (T, N) yields
    taus: (N,) maturities

    Returns
    -------
    theta0 : np.ndarray, shape (14,)
    """
    Y = np.asarray(Y, dtype=float)
    T, N = Y.shape

    # crude guesses
    # factor means: use average of shortest maturity for level, 0 for slope/curvature
    mu_L0 = float(Y[:, 0].mean())
    mu_S0 = 0.0
    mu_C0 = 0.0

    # AR coefficients: persistent level, less for slope/curvature
    phi_L0, phi_S0, phi_C0 = 0.98, 0.90, 0.80

    # state noise std devs (log scale)
    log_qL0 = np.log(0.01)
    log_qS0 = np.log(0.02)
    log_qC0 = np.log(0.02)

    # lambda around typical NS values (e.g. DL ~ 0.06–0.1 for monthly)
    log_lam0 = np.log(0.06)

    # AFNS vol parameters (continuous-time vols for level, slope, curvature)
    log_sig1_0 = np.log(0.01)
    log_sig2_0 = np.log(0.02)
    log_sig3_0 = np.log(0.02)

    # measurement noise std dev
    log_r0 = np.log(0.001)

    theta0 = np.array([
        phi_L0, phi_S0, phi_C0,
        mu_L0,  mu_S0,  mu_C0,
        log_qL0, log_qS0, log_qC0,
        log_lam0,
        log_sig1_0, log_sig2_0, log_sig3_0,
        log_r0
    ], dtype=float)

    return theta0


def fit_afns_mle(Y, taus, theta0=None, maxiter=300):
    """
    Estimate AFNS parameters by MLE via Kalman filter.

    Parameters
    ----------
    Y : (T, N) array
        Yield panel (time x maturities).
    taus : (N,) array
        Maturities in years (aligned with columns of Y).
    theta0 : array-like, optional
        Initial guess for parameters; if None, we use make_initial_theta().
    maxiter : int
        Maximum number of optimizer iterations.

    Returns
    -------
    theta_hat : np.ndarray
        Estimated parameter vector.
    res : OptimizeResult
        Full scipy.optimize result object.
    """
    Y = np.asarray(Y, dtype=float)
    taus = np.asarray(taus, dtype=float)

    if theta0 is None:
        theta0 = make_initial_theta(Y, taus)

    def objective(theta):
        return kalman_loglik_afns(theta, Y, taus)

    res = minimize(
        objective,
        theta0,
        method="L-BFGS-B",
        options={"maxiter": maxiter, "disp": True}
    )

    theta_hat = res.x
    return theta_hat, res

Maturities and yield matrix used for AFNS¶

Here we select a fixed set of maturities (e.g., 1y, 2y, 3y, ..., 30y) and build:

  • the yield matrix $Y$ as (T × N),
  • the maturity vector $\tau$ in years.

This consistent maturity grid is important for:

  • stable estimation,
  • clean stress tests,
  • and a well-defined mapping from factors to yields in the downstream hedging environment.
In [44]:
# Choose maturities (columns must exist in df_yields)
cols = ["1.0", "2.0", "3.0", "5.0", "7.0", "10.0", "20.0", "30.0"]
taus = np.array([float(c) for c in cols])

Y = df_yields[cols].values  # shape (T, N)

theta0 = make_initial_theta(Y, taus)
theta_hat, res = fit_afns_mle(Y, taus, theta0=theta0, maxiter=300)

print("Converged:", res.success)
print("Final negative log-likelihood:", res.fun)
print("Estimated parameters:", theta_hat)
Converged: True
Final negative log-likelihood: -17635.449541385406
Estimated parameters: [ 0.99014058  0.98892233  0.95661017  0.06655483 -0.04882837 -0.02951158
 -6.37122426 -5.89062987 -5.01021734 -1.54559202 -5.29410301 -5.8361747
 -3.28498768 -6.92484316]

Sanity-checking the fitted AFNS model¶

We instantiate the AFNS curve model and verify basic behavior:

  • yields are in a plausible range,
  • simulated yields move smoothly with factors,
  • the mapping is stable across maturities.

This “model health check” matters because any downstream hedging result is only meaningful if the term structure layer is sensible.

In [45]:
cm = CurveModelAFNS(theta_hat, taus)

# simulate 10 years monthly
T = 10 * 12
X_sim = cm.simulate_factors(T)
Y_dns = cm.simulate_yields(X_sim, model="dns")
Y_afns = cm.simulate_yields(X_sim, model="afns")
In [46]:
plt.plot(Y_afns)
Out[46]:
[<matplotlib.lines.Line2D at 0x2d7816760d0>,
 <matplotlib.lines.Line2D at 0x2d781676210>,
 <matplotlib.lines.Line2D at 0x2d781676350>,
 <matplotlib.lines.Line2D at 0x2d781676490>,
 <matplotlib.lines.Line2D at 0x2d7816765d0>,
 <matplotlib.lines.Line2D at 0x2d781676710>,
 <matplotlib.lines.Line2D at 0x2d781676850>,
 <matplotlib.lines.Line2D at 0x2d781676990>]
No description has been provided for this image

state_space.py

State space: filtering and smoothing for AFNS¶

For decision problems, we often want a clean state estimate at every time step.

The following functions provide:

  • Kalman filtering (online state estimation),
  • RTS smoothing (offline best estimate using the full sample),
  • log-likelihood computation for estimation.

The output (predicted/filtered/smoothed states) is also helpful for explaining uncertainty and model diagnostics.

In [47]:
def kalman_filter_afns(theta, Y, taus, beta0=None, P0=None):
    """
    Run the Kalman filter for the AFNS Option-B model.

    Model:
      X_t - mu = Phi (X_{t-1} - mu) + eta_t,  eta_t ~ N(0, Q)
      y_t = a + H X_t + eps_t,                eps_t ~ N(0, R)

    We absorb the intercept a into the observations:
      y'_t = y_t - a = H X_t + eps_t.

    Parameters
    ----------
    theta : array-like, shape (14,)
        Parameter vector as in unpack_theta.
    Y : (T, N) array
        Yield panel.
    taus : (N,) array
        Maturities in years.
    beta0 : (3,) array, optional
        Initial state mean; default = mu.
    P0 : (3,3) array, optional
        Initial state covariance; default = 0.1 * I.

    Returns
    -------
    filt_means : (T, 3)
        Filtered state means E[X_t | Y_1..t].
    filt_covs  : (T, 3, 3)
        Filtered covariance matrices.
    pred_means : (T, 3)
        One-step-ahead predicted means E[X_t | Y_1..t-1].
    pred_covs  : (T, 3, 3)
        One-step-ahead predicted covariances.
    loglik : float
        Total log-likelihood (same as in MLE, for reference).
    extra : dict
        Dict with (Phi, mu, Q, H, a, R) for reuse in smoother, plotting, etc.
    """
    Y = np.asarray(Y, dtype=float)
    taus = np.asarray(taus, dtype=float)
    T, N = Y.shape

    Phi, mu, Q, lam, sig1, sig2, sig3, r = unpack_theta(theta)
    H, a = build_measurement_matrices(taus, lam, sig1, sig2, sig3)
    R = (r**2) * np.eye(N)

    # absorb intercept
    Y_adj = Y - a[None, :]

    k = 3
    if beta0 is None:
        beta0 = mu.copy()
    if P0 is None:
        P0 = 0.1 * np.eye(k)

    filt_means = np.zeros((T, k))
    filt_covs  = np.zeros((T, k, k))
    pred_means = np.zeros((T, k))
    pred_covs  = np.zeros((T, k, k))

    beta_prev = beta0
    P_prev = P0
    I_k = np.eye(k)

    loglik = 0.0
    const = N * np.log(2 * np.pi)

    for t in range(T):
        # prediction
        beta_pred = mu + Phi @ (beta_prev - mu)
        P_pred = Phi @ P_prev @ Phi.T + Q

        pred_means[t] = beta_pred
        pred_covs[t] = P_pred

        y_t = Y_adj[t, :]          # (N,)
        S_t = H @ P_pred @ H.T + R # (N,N)

        # innovation covariance must be PD
        try:
            S_inv = np.linalg.inv(S_t)
            sign, logdet = np.linalg.slogdet(S_t)
            if sign <= 0:
                raise np.linalg.LinAlgError("Non-PD innovation covariance")
        except np.linalg.LinAlgError:
            # error handling
            return None, None, None, None, -np.inf, {}

        y_hat = H @ beta_pred      # (N,)
        innov = y_t - y_hat        # (N,)

        quad = innov.T @ S_inv @ innov
        loglik_t = -0.5 * (const + logdet + quad)
        loglik += loglik_t

        # update
        K_t = P_pred @ H.T @ S_inv     # (3,N)
        beta_filt = beta_pred + K_t @ innov
        P_filt = (I_k - K_t @ H) @ P_pred

        filt_means[t] = beta_filt
        filt_covs[t] = P_filt

        beta_prev, P_prev = beta_filt, P_filt

    extra = {
        "Phi": Phi,
        "mu": mu,
        "Q": Q,
        "H": H,
        "a": a,
        "R": R,
    }

    return filt_means, filt_covs, pred_means, pred_covs, float(loglik), extra

Predicted vs filtered vs smoothed states¶

  • Predicted $X_{t|t-1}$: what the model expects before seeing data at time $t$.
  • Filtered $X_{t|t}$: updated estimate after observing yields at $t$.
  • Smoothed $X_{t|T}$: best estimate using all observations $1,\ldots,T$.

For hedging experiments in this notebook we mainly use smoothed factors as a clean, denoised “state history” to drive baseline paths and stress tests.

In [48]:
def rts_smoother_afns(filt_means, filt_covs, pred_means, pred_covs, Phi):
    """
    Rauch–Tung–Striebel smoother for AFNS model.

    Parameters
    ----------
    filt_means : (T, 3)
        Filtered means from Kalman filter.
    filt_covs  : (T, 3, 3)
        Filtered covariances.
    pred_means : (T, 3)
        One-step-ahead predicted means.
    pred_covs  : (T, 3, 3)
        One-step-ahead predicted covariances.
    Phi : (3,3)
        State transition matrix (constant over time in this model).

    Returns
    -------
    smooth_means : (T, 3)
        Smoothed state means E[X_t | Y_1..T].
    smooth_covs  : (T, 3, 3)
        Smoothed state covariances.
    """
    filt_means = np.asarray(filt_means, dtype=float)
    filt_covs  = np.asarray(filt_covs, dtype=float)
    pred_means = np.asarray(pred_means, dtype=float)
    pred_covs  = np.asarray(pred_covs, dtype=float)

    T, k = filt_means.shape
    smooth_means = np.zeros_like(filt_means)
    smooth_covs  = np.zeros_like(filt_covs)

    # initialize at T-1
    smooth_means[-1] = filt_means[-1]
    smooth_covs[-1]  = filt_covs[-1]

    Phi_T = Phi.T

    for t in range(T - 2, -1, -1):
        P_filt_t = filt_covs[t]
        P_pred_next = pred_covs[t + 1]

        # smoother gain
        J_t = P_filt_t @ Phi_T @ np.linalg.inv(P_pred_next)

        # smoothed mean
        smooth_means[t] = (
            filt_means[t]
            + J_t @ (smooth_means[t + 1] - pred_means[t + 1])
        )

        # smoothed covariance
        smooth_covs[t] = (
            P_filt_t
            + J_t @ (smooth_covs[t + 1] - P_pred_next) @ J_t.T
        )

    return smooth_means, smooth_covs

AFNS filtering/smoothing output¶

We run the AFNS filter/smoother and compare the resulting factor estimates to the simpler DNS versions.

At this point, we have a complete term-structure layer:

  • no-arbitrage consistent measurement equation,
  • estimated factor dynamics,
  • and a usable state vector $X_t = (L_t, S_t, C_t)$.

Next we shift from modeling to decision-making: define a simplified banking book and formulate hedging as a control/RL problem.

In [49]:
# theta_hat from MLE, Y and taus from GSW data
filt_means, filt_covs, pred_means, pred_covs, loglik, extra = kalman_filter_afns(
    theta_hat, Y, taus
)

Phi = extra["Phi"]

smooth_means, smooth_covs = rts_smoother_afns(
    filt_means, filt_covs, pred_means, pred_covs, Phi
)

# smooth_means is (T,3): AFNS factors L_t, S_t, C_t
L = smooth_means[:, 0]
S = smooth_means[:, 1]
C = smooth_means[:, 2]
In [60]:
plt.plot(filt_means)
plt.legend(["Level", "Slope", "Curvature"])
Out[60]:
<matplotlib.legend.Legend at 0x2d787cfc190>
No description has been provided for this image
In [61]:
plt.plot(smooth_means)
plt.legend(["Level", "Slope", "Curvature"])
Out[61]:
<matplotlib.legend.Legend at 0x2d787d6c550>
No description has been provided for this image

Part III — Banking book and NII hedging experiment design¶

From State-Space Modeling to Optimal Control: Kalman Filtering and LQ Control¶

We now build on the state-space representation of the yield curve provided by the AFNS model to study estimation, prediction, and optimal hedging decisions. Once interest rate dynamics are expressed in state-space form, two powerful and closely related tools become available:

  1. Kalman filtering and smoothing, for inference on latent states;
  2. Optimal control theory, for designing dynamic hedging policies.

This section explains how these tools arise naturally from the AFNS state-space structure and how they are used in this work.


State-Space Structure as the Unifying Framework¶

Recall that the AFNS model provides a linear Gaussian state-space representation of the yield curve:

State (transition) equation¶

$ \mathbf{X}_{t+1}¶

\boldsymbol{\mu} + \Phi (\mathbf{X}t - \boldsymbol{\mu}) + \boldsymbol{\varepsilon}{t+1}, \qquad \boldsymbol{\varepsilon}_{t+1} \sim \mathcal{N}(0, Q) $

Measurement equation¶

$ \mathbf{y}_t¶

H \mathbf{X}_t + \mathbf{a} + \boldsymbol{\eta}_t, \qquad \boldsymbol{\eta}_t \sim \mathcal{N}(0, R) $

Here:

  • $\mathbf{X}_t = (L_t, S_t, C_t)$ are latent yield-curve factors,
  • $\mathbf{y}_t$ are observed yields across maturities.

This representation separates dynamics (how rates evolve) from measurement (how rates are observed), which is the key prerequisite for both filtering and control.


Kalman Filtering: Inference on Latent Yield Factors¶

The Kalman filter is an estimator for linear Gaussian state-space models. In this project, it is used to infer the unobserved AFNS factors from observed yield data.

Prediction¶

Before observing yields at time $t$, the model produces a forecast: $ \hat{\mathbf{X}}_{t|t-1} = \boldsymbol{\mu} + \Phi (\hat{\mathbf{X}}_{t-1|t-1} - \boldsymbol{\mu}) $

This is a model-based prediction driven solely by the transition equation.


Filtering¶

After observing yields $\mathbf{y}_t$, the prediction is updated: $ \hat{\mathbf{X}}_{t|t}¶

\hat{\mathbf{X}}{t|t-1} + K_t \big( \mathbf{y}_t - H \hat{\mathbf{X}}{t|t-1} - \mathbf{a} \big) $

The Kalman gain $K_t$ balances:

  • confidence in the model (via $Q$),
  • confidence in the data (via $R$).

This filtered estimate represents real-time knowledge of the yield curve.


Smoothing¶

For structural analysis, the Rauch–Tung–Striebel (RTS) smoother is applied after filtering. It combines past, present, and future information to produce: $ \hat{\mathbf{X}}_{t|T} $

In this project, smoothed AFNS factors provide:

  • low-noise state estimates,
  • a stable reference path for calibration,
  • and a clean baseline for control experiments.

From Estimation to Decision-Making: Augmenting the State¶

To study hedging, the state vector is augmented to include the hedge inventory: $ \mathbf{s}_t = \begin{pmatrix} L_t \\ S_t \\ C_t \\ h_t \end{pmatrix} $

The augmented dynamics are linear: $ \mathbf{s}_{t+1} = A \mathbf{s}_t + B u_t + \boldsymbol{\xi}_{t+1} $

where:

  • $u_t = \Delta h_t$ is the hedge adjustment (control),
  • yield-curve factors evolve exogenously,
  • hedge inventory evolves deterministically given control.

This linear state-space system forms the basis of optimal control theory.


Optimal Control Theory in This Context¶

Optimal control asks:

Given stochastic state dynamics, how should control actions be chosen to optimize a long-run objective?

In this project, the objective is to stabilize Net Interest Income (NII) while controlling hedge usage.

NII is a linear function of the state: $ \text{NII}_{t+1} = C^\top \mathbf{s}_t + D h_t + \text{noise} $

This linear–Gaussian structure makes classical control tools applicable.


Linear–Quadratic (LQ) Control with L2 Costs¶

Quadratic Objective¶

The LQ framework assumes a quadratic objective: $ \min_{u_t} \mathbb{E} \sum_{t=0}^{\infty} \left( \text{NII}_{t+1}^2 + \lambda_h h_t^2 + \lambda_u u_t^2 \right) $

Interpretation:

  • penalize NII volatility (interest rate risk),
  • penalize large hedge inventories (balance-sheet usage),
  • penalize frequent hedge adjustments (trading intensity).

Optimal Policy¶

Under linear dynamics and quadratic costs:

  • the value function is quadratic,
  • the optimal policy is linear in the state: $ u_t = -K \mathbf{s}_t $

The feedback matrix $K$ is obtained by solving the Riccati equation.

This solution is:

  • analytical,
  • stable,
  • fully interpretable.

Why LQ Control Is a Natural Benchmark¶

LQ control represents the best possible policy under the assumptions of:

  • linear dynamics,
  • Gaussian shocks,
  • symmetric (quadratic) costs.

In this project:

  • the AFNS model satisfies these assumptions almost exactly,
  • making LQ control an ideal theoretical benchmark.

Importantly, LQ control is not an approximation here. It is the optimal solution to a well-defined problem.


Role in This Project¶

The LQ solution serves three purposes:

  1. Economic benchmark
    It defines what optimal hedging looks like in a frictionless quadratic world.

  2. Diagnostic tool
    Deviations from LQ performance reveal where assumptions break down.

  3. Reference point for RL
    Reinforcement learning is introduced only when costs become non-quadratic (e.g. L1 transaction costs), a setting where LQ theory no longer applies.


Conceptual Summary¶

  • The AFNS model provides a linear Gaussian state-space description of interest rate dynamics.
  • The Kalman filter extracts latent yield-curve factors optimally.
  • Augmenting the state with hedge inventory transforms the model into a controlled system.
  • Linear–quadratic control delivers the optimal hedging policy under quadratic costs.
  • This classical solution establishes the benchmark against which more flexible methods are evaluated.

We now connect the term-structure state $X_t$ to a simplified IRRBB objective.

Main ingredients:

  • a stylized balance sheet with representative asset and liability repricing maturities,
  • a hedging instrument (FRA-style payoff in this notebook),
  • an objective function that trades off NII risk vs hedge usage and trading costs.

This is intentionally simplified: the goal is a clean, interpretable sandbox where we can compare classical control and RL under controlled assumptions.

LQ control benchmark (quadratic costs)¶

We first set up a classical Linear–Quadratic (LQ) benchmark:

  • State includes curve factors and hedge inventory.
  • Control is the hedge adjustment $u_t = \Delta h_t$.
  • Objective penalizes:
    • NII variability (risk term),
    • hedge inventory (balance sheet usage),
    • trading intensity (turnover).

This is the regime where classical control is expected to perform very well because the dynamics are linear-Gaussian and the objective is quadratic.

In [62]:
# ============================================================
# 0) USER INPUTS
# ============================================================

# Required:
# - X_smooth: array (T, 3) of Kalman-smoothed AFNS factors [L,S,C]
X_smooth = smooth_means
# - cm: calibrated CurveModelAFNS (needs lam, sig1..sig3 and AFNS yield function)

# Banking book + frequency
dt = 1.0 / 12.0         # monthly in years
tau_A = 3.0             # assets repricing maturity (years)
tau_L = 1.0             # liabilities repricing maturity (years)

A_notional = 100.0
L_notional = 100.0

# LQ weights (tune later)
alpha_nii = 1.0          # strength of "penalize NII" term
lambda_u = 1e-2          # trading penalty (smaller => more aggressive hedging)
lambda_h = 1e-7          # hedge inventory penalty (optional)

lambda_u = 1e-3
lambda_h = 1e-6

# Stress scenario
shock_bps = 200          # +200 bps
shock = shock_bps / 10000.0

# ============================================================
# 1) AFNS yield + forward-rate utilities
# ============================================================

def afns_yield_single_from_cm(cm, X, tau):
    """
    Compute AFNS yield y(tau;X) using cm parameters.
    Depends on 
    existing afns_yields_from_factors implementation.
    """
    taus = np.array([tau], dtype=float)
    y = afns_yields_from_factors(X, taus, cm.lam, cm.sig1, cm.sig2, cm.sig3)
    return float(y[0])


def forward_rate_cc_from_cm(cm, X, tau1, tau2):
    """
    Continuous-compounded forward rate f(tau1,tau2):
      f = (tau2*y(tau2) - tau1*y(tau1)) / (tau2 - tau1)
    """
    y1 = afns_yield_single_from_cm(cm, X, tau1)
    y2 = afns_yield_single_from_cm(cm, X, tau2)
    return (tau2 * y2 - tau1 * y1) / (tau2 - tau1)


# ============================================================
# 2) NII definition with FRA hedge
# ============================================================

def compute_unhedged_nii_path(cm, X_path, A_notional, L_notional, tau_A, tau_L, dt):
    """
    Unhedged NII_{t+1} = A*y_t(tauA)*dt - L*y_t(tauL)*dt
    Returns array length T-1.
    """
    T = X_path.shape[0]
    NII0 = np.zeros(T-1)
    for t in range(T-1):
        yA = afns_yield_single_from_cm(cm, X_path[t], tau_A)
        yL = afns_yield_single_from_cm(cm, X_path[t], tau_L)
        NII0[t] = A_notional * yA * dt - L_notional * yL * dt
    return NII0


def compute_hedged_nii_path_FRA(cm, X_path, h_path, A_notional, L_notional, tau_A, tau_L, dt):
    """
    Hedged NII_{t+1} = A*y_t(tauA)*dt - L*y_t(tauL)*dt + h_t*(K_t - y_{t+1}(tauL))*dt
    with K_t = forward(tauL, tauL+dt).
    """
    T = X_path.shape[0]
    NIIh = np.zeros(T-1)
    for t in range(T-1):
        yA = afns_yield_single_from_cm(cm, X_path[t], tau_A)
        yL = afns_yield_single_from_cm(cm, X_path[t], tau_L)

        K_t = forward_rate_cc_from_cm(cm, X_path[t], tau_L, tau_L + dt)
        y_float_next = afns_yield_single_from_cm(cm, X_path[t+1], tau_L)

        NIIh[t] = (A_notional * yA * dt
                   - L_notional * yL * dt
                   + h_path[t] * (K_t - y_float_next) * dt)
    return NIIh


# ============================================================
# 3) Estimate factor dynamics from smoothed factors (AR(1))
#    X_{t+1} = c + Phi X_t + eps
#    We'll convert to mean-reverting form with mu if desired.
# ============================================================

def fit_var1(X):
    """
    Fit VAR(1): X_{t+1} = c + Phi X_t + eps, via OLS.
    Returns c (3,), Phi (3,3), Sigma (3,3).
    """
    X = np.asarray(X, dtype=float)
    Y = X[1:]             # (T-1,3)
    Z = X[:-1]            # (T-1,3)

    # add intercept
    Z1 = np.column_stack([np.ones(Z.shape[0]), Z])  # (T-1, 1+3)

    # OLS for each equation
    B = np.linalg.lstsq(Z1, Y, rcond=None)[0]        # (1+3, 3)
    c = B[0]                                         # (3,)
    Phi = B[1:].T                                    # (3,3) because (3x3)

    resid = Y - Z1 @ B                               # (T-1,3)
    Sigma = (resid.T @ resid) / (resid.shape[0] - (1 + 3))  # sample cov

    return c, Phi, Sigma


# ============================================================
# 4) Build LQ problem from scratch
#    State x_t = [L,S,C,h]
#    Dynamics:  X_{t+1} = c + Phi X_t + eps,  h_{t+1} = h_t + u_t
#
#    Objective: penalize (approx NII)^2 + lambda_h h^2 + lambda_u u^2
#
#    Key step: build H_x (sensitivity of NII to state) numerically
# ============================================================

def numerical_grad_y(cm, X_ref, tau, eps=1e-5):
    """
    Numerical gradient of y(tau;X) wrt X=(L,S,C) using central differences.
    Returns grad (3,).
    """
    grad = np.zeros(3)
    for i in range(3):
        d = np.zeros(3)
        d[i] = eps
        yp = afns_yield_single_from_cm(cm, X_ref + d, tau)
        ym = afns_yield_single_from_cm(cm, X_ref - d, tau)
        grad[i] = (yp - ym) / (2 * eps)
    return grad


def build_Hx_QR_from_nii(cm, X_ref,
                         A_notional, L_notional, tau_A, tau_L, dt,
                         alpha_nii, lambda_h, lambda_u):
    """
    Build H_x, Q_s, R for LQ:
      approx NII(x_t) ≈ H_x' [X_t; h_t]
      => (NII)^2 ≈ x' (alpha * H_x H_x') x
    """
    # Factor sensitivity of the base NII part (using numerical gradients)
    grad_yA = numerical_grad_y(cm, X_ref, tau_A)
    grad_yL = numerical_grad_y(cm, X_ref, tau_L)

    Hx_X = dt * (A_notional * grad_yA - L_notional * grad_yL)  # (3,)

    # Hedge sensitivity via FRA: d/dh of hedge payoff term at ref
    K_ref = forward_rate_cc_from_cm(cm, X_ref, tau_L, tau_L + dt)
    yL_ref = afns_yield_single_from_cm(cm, X_ref, tau_L)
    d_h = (K_ref - yL_ref) * dt   # scalar

    H_x = np.zeros(4)
    H_x[:3] = Hx_X
    H_x[3] = d_h

    # Quadratic cost matrices
    Q_s = alpha_nii * np.outer(H_x, H_x)
    if lambda_h > 0:
        e4 = np.array([0.0, 0.0, 0.0, 1.0])
        Q_s = Q_s + lambda_h * np.outer(e4, e4)

    R = np.array([[lambda_u]], dtype=float)

    return H_x, Q_s, R


def build_AB_from_Phi(Phi):
    A = np.zeros((4, 4))
    A[:3, :3] = Phi
    A[3, 3] = 1.0
    B = np.zeros((4, 1))
    B[3, 0] = 1.0
    return A, B


def solve_discrete_riccati(A, B, Q, R, max_iter=20000, tol=1e-12):
    """
    Iterative solution to discrete algebraic Riccati equation (DARE).
    Returns P, K where u_t = -K x_t.
    """
    A = np.asarray(A, float)
    B = np.asarray(B, float)
    Q = np.asarray(Q, float)
    R = np.asarray(R, float)

    P = Q.copy()
    for _ in range(max_iter):
        S = R + B.T @ P @ B
        K = np.linalg.solve(S, B.T @ P @ A)
        P_next = Q + A.T @ P @ A - A.T @ P @ B @ K
        if np.max(np.abs(P_next - P)) < tol:
            P = P_next
            break
        P = P_next

    S = R + B.T @ P @ B
    K = np.linalg.solve(S, B.T @ P @ A)
    return P, K


# ============================================================
# 5) Run LQ hedging on an exogenous factor path
# ============================================================

def run_lq_hedge_on_path(cm, X_path, Phi, H_x, Q_s, R,
                         A_notional, L_notional, tau_A, tau_L, dt):
    """
    Given an exogenous factor path X_path and VAR(1) Phi (for control design),
    compute LQ hedge policy and resulting hedge path h_t and NII.
    """
    A, B = build_AB_from_Phi(Phi)
    P, K = solve_discrete_riccati(A, B, Q_s, R)

    # simulate hedge
    T = X_path.shape[0]
    h = np.zeros(T)
    u = np.zeros(T-1)

    for t in range(T-1):
        x_t = np.array([X_path[t,0], X_path[t,1], X_path[t,2], h[t]])
        u_t = -float(K @ x_t)
        u[t] = u_t
        h[t+1] = h[t] + u_t

    NII_unhedged = compute_unhedged_nii_path(cm, X_path, A_notional, L_notional, tau_A, tau_L, dt)
    NII_hedged = compute_hedged_nii_path_FRA(cm, X_path, h, A_notional, L_notional, tau_A, tau_L, dt)

    return {
        "K": K,
        "h": h,
        "u": u,
        "NII_unhedged": NII_unhedged,
        "NII_hedged": NII_hedged,
    }


# ============================================================
# 6) Stress path builder (shock at t=0, then propagate with Phi, no noise)
# ============================================================

def make_stress_path_from_var1(X0, c, Phi, T, shock_vec=None):
    """
    Deterministic stressed path:
      X_0 = X0 + shock_vec
      X_{t+1} = c + Phi X_t
    """
    X = np.zeros((T, 3))
    if shock_vec is None:
        shock_vec = np.zeros(3)
    X[0] = X0 + shock_vec
    for t in range(T-1):
        X[t+1] = c + Phi @ X[t]
    return X


# ============================================================
# 7) MAIN RUN: baseline + stress
# ============================================================

# --- Fit factor dynamics from smoothed factors ---
c_hat, Phi_hat, Sigma_hat = fit_var1(X_smooth)
X_ref = X_smooth.mean(axis=0)

print("Fitted VAR(1) Phi:\n", Phi_hat)
print("Reference X_ref (mean of smoothed):", X_ref)

# --- Build LQ cost from NII sensitivities ---
H_x, Q_s, R = build_Hx_QR_from_nii(
    cm, X_ref,
    A_notional, L_notional, tau_A, tau_L, dt,
    alpha_nii, lambda_h, lambda_u
)

print("H_x =", H_x)
print("Q_s max abs =", np.max(np.abs(Q_s)))
print("R =", R)

# --- Baseline: use the observed smoothed factor path as exogenous ---
res_base = run_lq_hedge_on_path(
    cm, X_smooth, Phi_hat, H_x, Q_s, R,
    A_notional, L_notional, tau_A, tau_L, dt
)

print("LQ gain K =", res_base["K"])
print("max |h| baseline:", np.max(np.abs(res_base["h"])))
print("std NII unhedged baseline:", np.std(res_base["NII_unhedged"]))
print("std NII hedged baseline:", np.std(res_base["NII_hedged"]))

plt.figure(figsize=(9,4))
plt.plot(res_base["NII_unhedged"], label="Unhedged (baseline)")
plt.plot(res_base["NII_hedged"], label="LQ hedged (baseline)")
plt.title("Baseline path (smoothed factors): Unhedged vs LQ-hedged NII")
plt.xlabel("t (months)")
plt.ylabel("NII")
plt.legend()
plt.tight_layout()
plt.show()

plt.figure(figsize=(9,3))
plt.plot(res_base["h"], label="Hedge notional h_t (baseline)")
plt.title("LQ hedge position (baseline)")
plt.xlabel("t (months)")
plt.ylabel("h_t")
plt.legend()
plt.tight_layout()
plt.show()


# --- Stress: +200bp parallel shock implemented as a level-factor bump at t=0 ---
# You can also shock slope/curvature; start with level.
shock_vec = np.array([shock, 0.0, 0.0])

X_stress = make_stress_path_from_var1(
    X0=X_ref, c=c_hat, Phi=Phi_hat, T=X_smooth.shape[0], shock_vec=shock_vec
)

res_stress = run_lq_hedge_on_path(
    cm, X_stress, Phi_hat, H_x, Q_s, R,
    A_notional, L_notional, tau_A, tau_L, dt
)

print("\n--- STRESS RESULTS (+200bp level shock at t=0, deterministic VAR1 propagation) ---")
print("LQ gain K =", res_stress["K"])
print("max |h| stress:", np.max(np.abs(res_stress["h"])))
print("std NII unhedged stress:", np.std(res_stress["NII_unhedged"]))
print("std NII hedged stress:", np.std(res_stress["NII_hedged"]))

plt.figure(figsize=(9,4))
plt.plot(res_stress["NII_unhedged"], label="Unhedged (stress)")
plt.plot(res_stress["NII_hedged"], label="LQ hedged (stress)")
plt.title("Stress (+200bp level shock at t=0): Unhedged vs LQ-hedged NII")
plt.xlabel("t (months)")
plt.ylabel("NII")
plt.legend()
plt.tight_layout()
plt.show()

plt.figure(figsize=(9,3))
plt.plot(res_stress["h"], label="Hedge notional h_t (stress)")
plt.title("LQ hedge position under stress")
plt.xlabel("t (months)")
plt.ylabel("h_t")
plt.legend()
plt.tight_layout()
plt.show()
Fitted VAR(1) Phi:
 [[ 0.99370822  0.00405803 -0.00149313]
 [-0.01363542  0.99260183  0.00425663]
 [ 0.01979163 -0.02327738  0.95583683]]
Reference X_ref (mean of smoothed): [ 0.07275833 -0.0443458  -0.02780776]
H_x = [ 0.00000000e+00 -1.34869058e+00  9.88642779e-01  1.69524561e-04]
Q_s max abs = 1.8189662905957733
R = [[0.001]]
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\2975159280.py:234: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  u_t = -float(K @ x_t)
LQ gain K = [[ 3.25468959 -6.40606207  1.65585809  0.03156371]]
max |h| baseline: 19.29354216567544
std NII unhedged baseline: 0.041502314399371805
std NII hedged baseline: 0.037672283687605466
No description has been provided for this image
No description has been provided for this image
--- STRESS RESULTS (+200bp level shock at t=0, deterministic VAR1 propagation) ---
LQ gain K = [[ 3.25468959 -6.40606207  1.65585809  0.03156371]]
max |h| stress: 17.558518446200466
std NII unhedged stress: 0.011522314785212952
std NII hedged stress: 0.010116693836186045
No description has been provided for this image
No description has been provided for this image
In [63]:
def summarize_paths(N0, N1, h, u):
    """
    Produce standard metrics for hedging quality and cost.
    N0: unhedged NII (T-1,)
    N1: hedged NII (T-1,)
    h: hedge notional (T,)
    u: hedge trades (T-1,)
    """
    out = {}
    out["std_unhedged"] = float(np.std(N0))
    out["std_hedged"] = float(np.std(N1))
    out["std_reduction_%"] = float(100.0 * (1.0 - out["std_hedged"] / out["std_unhedged"])) if out["std_unhedged"] > 0 else np.nan

    out["min_unhedged"] = float(np.min(N0))
    out["min_hedged"] = float(np.min(N1))
    out["p05_unhedged"] = float(np.quantile(N0, 0.05))
    out["p05_hedged"] = float(np.quantile(N1, 0.05))

    out["mean_unhedged"] = float(np.mean(N0))
    out["mean_hedged"] = float(np.mean(N1))

    # “Cost” proxies
    out["mean_abs_h"] = float(np.mean(np.abs(h)))
    out["max_abs_h"] = float(np.max(np.abs(h)))
    out["mean_abs_u"] = float(np.mean(np.abs(u)))
    out["max_abs_u"] = float(np.max(np.abs(u)))

    return out


def sweep_lq_hyperparams(cm, X_path, Phi_hat, X_ref,
                         A_notional, L_notional, tau_A, tau_L, dt,
                         alpha_nii,
                         lambda_u_grid,
                         lambda_h_grid):
    """
    Sweeps (lambda_u, lambda_h) and returns a DataFrame of metrics.

    Requires build_Hx_QR_from_nii(...) and run_lq_hedge_on_path(...).
    """
    rows = []

    for lambda_u in lambda_u_grid:
        for lambda_h in lambda_h_grid:

            H_x, Q_s, R = build_Hx_QR_from_nii(
                cm, X_ref,
                A_notional, L_notional, tau_A, tau_L, dt,
                alpha_nii=alpha_nii,
                lambda_h=lambda_h,
                lambda_u=lambda_u
            )

            res = run_lq_hedge_on_path(
                cm, X_path, Phi_hat, H_x, Q_s, R,
                A_notional, L_notional, tau_A, tau_L, dt
            )

            metrics = summarize_paths(
                res["NII_unhedged"], res["NII_hedged"], res["h"], res["u"]
            )

            row = {
                "lambda_u": float(lambda_u),
                "lambda_h": float(lambda_h),
                "Hx_hedge_sensitivity": float(H_x[3]),
                "K_max_abs": float(np.max(np.abs(res["K"]))),
                **metrics
            }
            rows.append(row)

    df = pd.DataFrame(rows)

    # useful derived columns
    df["turnover_proxy"] = df["mean_abs_u"]      # rename, but keep explicit too
    df["inventory_proxy"] = df["mean_abs_h"]

    # Sort by best hedging first (std reduction), then lower turnover
    df = df.sort_values(["std_reduction_%", "turnover_proxy"], ascending=[False, True]).reset_index(drop=True)

    return df
In [64]:
lambda_u_grid = [1e-9, 1e-6, 1e-3]
lambda_h_grid = [1e-9, 1e-6, 1e-3]

df_sweep = sweep_lq_hyperparams(
    cm=cm,
    X_path=X_smooth,          # or X_stress
    Phi_hat=Phi_hat,
    X_ref=X_ref,
    A_notional=100.0,
    L_notional=100.0,
    tau_A=3.0,
    tau_L=1.0,
    dt=1/12,
    alpha_nii=1.0,
    lambda_u_grid=lambda_u_grid,
    lambda_h_grid=lambda_h_grid
)

df_sweep
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\2975159280.py:234: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  u_t = -float(K @ x_t)
Out[64]:
lambda_u lambda_h Hx_hedge_sensitivity K_max_abs std_unhedged std_hedged std_reduction_% min_unhedged min_hedged p05_unhedged p05_hedged mean_unhedged mean_hedged mean_abs_h max_abs_h mean_abs_u max_abs_u turnover_proxy inventory_proxy
0 1.000000e-03 1.000000e-06 0.00017 6.406062 0.041502 0.037672 9.228475 -0.069115 -0.064571 -0.028967 -0.028228 0.031514 0.028723 14.301091 19.293542 0.130948 0.445572 0.130948 14.301091
1 1.000000e-06 1.000000e-06 0.00017 140.444405 0.041502 0.038307 7.699966 -0.069115 -0.073067 -0.028967 -0.029062 0.031514 0.028599 7.382631 19.662846 0.599310 3.206865 0.599310 7.382631
2 1.000000e-09 1.000000e-06 0.00017 224.181261 0.041502 0.038391 7.497340 -0.069115 -0.073538 -0.028967 -0.029146 0.031514 0.028622 7.151036 19.601891 0.779073 4.209173 0.779073 7.151036
3 1.000000e-03 1.000000e-03 0.00017 0.618039 0.041502 0.041499 0.008182 -0.069115 -0.069119 -0.028967 -0.028967 0.031514 0.031511 0.007600 0.020229 0.000614 0.003281 0.000614 0.007600
4 1.000000e-06 1.000000e-03 0.00017 0.999002 0.041502 0.041499 0.007974 -0.069115 -0.069120 -0.028967 -0.028967 0.031514 0.031511 0.007356 0.020165 0.000801 0.004330 0.000801 0.007356
5 1.000000e-09 1.000000e-03 0.00017 0.999999 0.041502 0.041499 0.007973 -0.069115 -0.069120 -0.028967 -0.028967 0.031514 0.031511 0.007356 0.020165 0.000802 0.004332 0.000802 0.007356
6 1.000000e-03 1.000000e-09 0.00017 19.492054 0.041502 0.059200 -42.643736 -0.069115 -0.233215 -0.028967 -0.097223 0.031514 -0.004273 241.943762 322.147839 0.812684 2.626348 0.812684 241.943762
7 1.000000e-09 1.000000e-09 0.00017 7519.772929 0.041502 0.120635 -190.670443 -0.069115 -0.541424 -0.028967 -0.350242 0.031514 -0.068566 247.785757 678.185855 26.404902 143.727011 26.404902 247.785757
8 1.000000e-06 1.000000e-09 0.00017 1256.590292 0.041502 0.126251 -204.201778 -0.069115 -0.538397 -0.028967 -0.348999 0.031514 -0.071237 325.598944 700.395119 12.267414 42.837237 12.267414 325.598944
In [65]:
def plot_lq_tradeoff(df, title="LQ hyperparameter sweep: NII risk vs turnover"):
    """
    Scatter plot:
      x = turnover (mean_abs_u)
      y = hedged NII volatility (std_hedged)
    """
    x = df["mean_abs_u"].values
    y = df["std_hedged"].values

    plt.figure(figsize=(8,5))
    plt.scatter(x, y)

    # annotate points with (lambda_u, lambda_h)
    for _, row in df.iterrows():
        plt.annotate(
            f"u={row['lambda_u']:.0e}, h={row['lambda_h']:.0e}",
            (row["mean_abs_u"], row["std_hedged"]),
            fontsize=8,
            xytext=(4, 4),
            textcoords="offset points"
        )

    plt.xlabel("Turnover proxy: mean(|u_t|)")
    plt.ylabel("Risk proxy: std(NII_hedged)")
    plt.title(title)
    plt.tight_layout()
    plt.show()

Choosing penalties: risk–turnover trade-off curve¶

Rather than picking $\lambda_u, \lambda_h$ arbitrarily, we sweep penalty values and measure:

  • NII risk reduction
  • versus trading activity (turnover) and inventory usage.

This produces a Pareto-style trade-off curve.
We then select a benchmark point that delivers meaningful risk reduction without unrealistic trading.

In [66]:
plot_lq_tradeoff(df_sweep)
No description has been provided for this image
In [67]:
def summarize_paths(N0, N1, h, u):
    out = {}
    out["std_unhedged"] = float(np.std(N0))
    out["std_hedged"] = float(np.std(N1))
    out["std_reduction_%"] = float(100.0 * (1.0 - out["std_hedged"]/out["std_unhedged"])) if out["std_unhedged"] > 0 else np.nan

    out["p05_unhedged"] = float(np.quantile(N0, 0.05))
    out["p05_hedged"] = float(np.quantile(N1, 0.05))
    out["min_unhedged"] = float(np.min(N0))
    out["min_hedged"] = float(np.min(N1))

    out["mean_unhedged"] = float(np.mean(N0))
    out["mean_hedged"] = float(np.mean(N1))

    out["mean_abs_h"] = float(np.mean(np.abs(h)))
    out["max_abs_h"]  = float(np.max(np.abs(h)))
    out["mean_abs_u"] = float(np.mean(np.abs(u)))
    out["max_abs_u"]  = float(np.max(np.abs(u)))
    return out


def plot_lq_tradeoff(df, title="LQ sweep: NII risk vs turnover"):
    """
    Scatter: x=turnover (mean|u|), y=risk (std hedged).
    Annotate with lambdas.
    """
    x = df["mean_abs_u"].values
    y = df["std_hedged"].values

    plt.figure(figsize=(8,5))
    plt.scatter(x, y)
    for _, row in df.iterrows():
        plt.annotate(
            f"u={row['lambda_u']:.0e}, h={row['lambda_h']:.0e}",
            (row["mean_abs_u"], row["std_hedged"]),
            fontsize=8, xytext=(4,4), textcoords="offset points"
        )
    plt.xlabel("Turnover proxy: mean(|u_t|)")
    plt.ylabel("Risk proxy: std(NII_hedged)")
    plt.title(title)
    plt.tight_layout()
    plt.show()
    

Simplified NII model and hedge instrument¶

We compute a stylized monthly NII:

  • Assets repricing at a representative maturity $\tau_A$
  • Liabilities repricing at $\tau_L$
  • Hedge is modeled as an FRA-like payoff linked to forward vs realized short/roll rate

This structure is simple enough for transparency but still captures the core IRRBB mechanism: changes in the yield curve shift the rates that drive asset income and liability expense, and hedging offsets part of that sensitivity.

In [68]:
# Monthly setup
dt = 1.0/12.0
tau_A = 3.0
tau_L = 1.0
A_notional = 100.0
L_notional = 100.0

# Fit VAR(1) to smoothed factors
c_hat, Phi_hat, Sigma_hat = fit_var1(X_smooth)
X_ref = X_smooth.mean(axis=0)

print("Phi_hat:\n", Phi_hat)
print("X_ref:", X_ref)

# Stress scenario builders (deterministic paths for clean comparison)
def scenario_parallel_up(T, shock_bps=200):
    shock = shock_bps/10000.0
    shock_vec = np.array([shock, 0.0, 0.0])  # Level shock
    return make_stress_path_from_var1(X0=X_ref, c=c_hat, Phi=Phi_hat, T=T, shock_vec=shock_vec)

def scenario_bear_steepener(T, level_bps=200, slope_bps=100):
    # Simple stylized: +level and -slope (more steepness in NS sign convention)
    level = level_bps/10000.0
    slope = slope_bps/10000.0
    shock_vec = np.array([level, -slope, 0.0])
    return make_stress_path_from_var1(X0=X_ref, c=c_hat, Phi=Phi_hat, T=T, shock_vec=shock_vec)

def scenario_high_vol(T, vol_scale=3.0, seed=123):
    # Stochastic path: amplify innovations
    rng = np.random.default_rng(seed)
    X = np.zeros((T,3))
    X[0] = X_ref.copy()
    for t in range(T-1):
        eps = rng.multivariate_normal(np.zeros(3), vol_scale * Sigma_hat)
        X[t+1] = c_hat + Phi_hat @ X[t] + eps
    return X

T = X_smooth.shape[0]
X_parallel = scenario_parallel_up(T, shock_bps=200)
X_steepen  = scenario_bear_steepener(T, level_bps=200, slope_bps=100)
X_highvol  = scenario_high_vol(T, vol_scale=3.0, seed=123)
Phi_hat:
 [[ 0.99370822  0.00405803 -0.00149313]
 [-0.01363542  0.99260183  0.00425663]
 [ 0.01979163 -0.02327738  0.95583683]]
X_ref: [ 0.07275833 -0.0443458  -0.02780776]
In [69]:
alpha_nii = 1.0

lambda_u_grid = [1e-4, 1e-3, 1e-2]
lambda_h_grid = [1e-7, 1e-6, 1e-5]

rows = []
for lambda_u in lambda_u_grid:
    for lambda_h in lambda_h_grid:

        H_x, Q_s, R = build_Hx_QR_from_nii(
            cm, X_ref,
            A_notional, L_notional, tau_A, tau_L, dt,
            alpha_nii=alpha_nii,
            lambda_h=lambda_h,
            lambda_u=lambda_u
        )

        res = run_lq_hedge_on_path(
            cm, X_smooth, Phi_hat, H_x, Q_s, R,
            A_notional, L_notional, tau_A, tau_L, dt
        )

        metrics = summarize_paths(res["NII_unhedged"], res["NII_hedged"], res["h"], res["u"])
        rows.append({
            "lambda_u": float(lambda_u),
            "lambda_h": float(lambda_h),
            "Hx_hedge_sensitivity": float(H_x[3]),
            "K_max_abs": float(np.max(np.abs(res["K"]))),
            **metrics
        })

df_lq = pd.DataFrame(rows)

# filter out degeneracy if needed
df_lq = df_lq[df_lq["Hx_hedge_sensitivity"].abs() > 1e-10].copy()

# sort: prefer bigger std reduction and lower turnover
df_lq = df_lq.sort_values(["std_reduction_%", "mean_abs_u"], ascending=[False, True]).reset_index(drop=True)

df_lq
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\2975159280.py:234: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  u_t = -float(K @ x_t)
Out[69]:
lambda_u lambda_h Hx_hedge_sensitivity K_max_abs std_unhedged std_hedged std_reduction_% p05_unhedged p05_hedged min_unhedged min_hedged mean_unhedged mean_hedged mean_abs_h max_abs_h mean_abs_u max_abs_u
0 0.0001 1.000000e-07 0.00017 58.550083 0.041502 0.024047 42.057613 -0.028967 -0.030056 -0.069115 -0.067088 0.031514 0.008964 112.845336 155.551100 1.149311 3.772153
1 0.0010 1.000000e-07 0.00017 11.867674 0.041502 0.026164 36.957726 -0.028967 -0.024765 -0.069115 -0.074443 0.031514 0.015548 96.388903 123.418701 0.405471 1.539168
2 0.0100 1.000000e-07 0.00017 2.374251 0.041502 0.033512 19.252694 -0.028967 -0.023946 -0.069115 -0.058499 0.031514 0.026712 34.571771 48.226221 0.114668 0.317721
3 0.0010 1.000000e-06 0.00017 6.406062 0.041502 0.037672 9.228475 -0.028967 -0.028228 -0.069115 -0.064571 0.031514 0.028723 14.301091 19.293542 0.130948 0.445572
4 0.0001 1.000000e-06 0.00017 22.071626 0.041502 0.037781 8.965698 -0.028967 -0.028472 -0.069115 -0.068063 0.031514 0.028564 10.908506 20.326023 0.280277 0.874768
5 0.0100 1.000000e-06 0.00017 1.255649 0.041502 0.038588 7.022279 -0.028967 -0.028169 -0.069115 -0.063581 0.031514 0.029659 11.381624 14.652016 0.045579 0.170295
6 0.0100 1.000000e-05 0.00017 0.647022 0.041502 0.041100 0.969242 -0.028967 -0.028891 -0.069115 -0.068646 0.031514 0.031228 1.468673 1.976866 0.013285 0.045391
7 0.0010 1.000000e-05 0.00017 2.236258 0.041502 0.041112 0.940380 -0.028967 -0.028916 -0.069115 -0.069002 0.031514 0.031211 1.123422 2.084656 0.028567 0.088878
8 0.0001 1.000000e-05 0.00017 6.329965 0.041502 0.041142 0.868566 -0.028967 -0.028950 -0.069115 -0.069369 0.031514 0.031210 0.849649 2.061880 0.044613 0.178204
In [70]:
plot_lq_tradeoff(df_lq, title="Baseline path: LQ sweep (risk vs turnover)")
No description has been provided for this image
In [71]:
benchmark = df_lq.iloc[3].to_dict()
benchmark
Out[71]:
{'lambda_u': 0.001,
 'lambda_h': 1e-06,
 'Hx_hedge_sensitivity': 0.00016952456054489685,
 'K_max_abs': 6.406062071472268,
 'std_unhedged': 0.041502314399371805,
 'std_hedged': 0.037672283687605466,
 'std_reduction_%': 9.228475007225889,
 'p05_unhedged': -0.028967123151622646,
 'p05_hedged': -0.028228493090004823,
 'min_unhedged': -0.06911495202798612,
 'min_hedged': -0.06457091968364496,
 'mean_unhedged': 0.0315137176683032,
 'mean_hedged': 0.028723195974091707,
 'mean_abs_h': 14.301090737582067,
 'max_abs_h': 19.29354216567544,
 'mean_abs_u': 0.13094791972470568,
 'max_abs_u': 0.44557190322919743}

Stress testing protocol¶

We evaluate policies under:

  • baseline (historical smoothed factors),
  • parallel shock (+200bp level),
  • high-vol regime (scaled factor innovations),
  • steepener scenario.

For each scenario we compare:

  • unhedged NII path,
  • LQ-hedged NII path,
  • and later RL-hedged NII path.

Metrics include volatility and downside (e.g., 5% quantile), plus turnover/inventory proxies.

In [72]:
def eval_policy_on_path(X_path, lambda_u, lambda_h, label):
    H_x, Q_s, R = build_Hx_QR_from_nii(
        cm, X_ref,
        A_notional, L_notional, tau_A, tau_L, dt,
        alpha_nii=alpha_nii,
        lambda_h=lambda_h,
        lambda_u=lambda_u
    )

    res = run_lq_hedge_on_path(
        cm, X_path, Phi_hat, H_x, Q_s, R,
        A_notional, L_notional, tau_A, tau_L, dt
    )

    metrics = summarize_paths(res["NII_unhedged"], res["NII_hedged"], res["h"], res["u"])
    return {
        "scenario": label,
        "lambda_u": float(lambda_u),
        "lambda_h": float(lambda_h),
        **metrics
    }, res


# Choose lambdas (use preferred benchmark)
lambda_u_star = float(benchmark["lambda_u"])
lambda_h_star = float(benchmark["lambda_h"])

rows = []
res_store = {}

for label, X_path in [
    ("baseline (smoothed)", X_smooth),
    ("stress: +200bp parallel", X_parallel),
    ("stress: bear steepener", X_steepen),
    ("stress: high vol x3", X_highvol),
]:
    row, res = eval_policy_on_path(X_path, lambda_u_star, lambda_h_star, label)
    rows.append(row)
    res_store[label] = res

df_stress = pd.DataFrame(rows)
df_stress
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\2975159280.py:234: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  u_t = -float(K @ x_t)
Out[72]:
scenario lambda_u lambda_h std_unhedged std_hedged std_reduction_% p05_unhedged p05_hedged min_unhedged min_hedged mean_unhedged mean_hedged mean_abs_h max_abs_h mean_abs_u max_abs_u
0 baseline (smoothed) 0.001 0.000001 0.041502 0.037672 9.228475 -0.028967 -0.028228 -0.069115 -0.064571 0.031514 0.028723 14.301091 19.293542 0.130948 0.445572
1 stress: +200bp parallel 0.001 0.000001 0.011522 0.010117 12.199119 0.021739 0.020354 0.021490 0.020128 0.036319 0.033284 14.585011 17.558518 0.050177 0.539936
2 stress: bear steepener 0.001 0.000001 0.015246 0.013545 11.152510 0.021091 0.019772 0.020990 0.019680 0.038386 0.035136 14.777920 18.258270 0.053618 0.603996
3 stress: high vol x3 0.001 0.000001 0.031582 0.028799 8.811852 -0.010008 -0.010316 -0.058414 -0.054684 0.040042 0.036726 13.971398 17.529290 0.090683 0.484180

Policy evaluation on identical scenarios¶

To make the comparison fair, policies are evaluated on the same underlying factor/yield paths.

This isolates the policy effect:

  • differences in NII are due to hedging decisions,
  • not due to different simulated market paths.

This section produces the main “headline” plots:

  • NII trajectories under stress,
  • hedge inventory paths,
  • and summary metrics.
In [73]:
def plot_paths(res, title):
    plt.figure(figsize=(9,4))
    plt.plot(res["NII_unhedged"], label="Unhedged")
    plt.plot(res["NII_hedged"], label="LQ hedged")
    plt.title(title)
    plt.xlabel("t (months)")
    plt.ylabel("NII")
    plt.legend()
    plt.tight_layout()
    plt.show()

    plt.figure(figsize=(9,3))
    plt.plot(res["h"], label="h_t")
    plt.title(title + " — hedge notional")
    plt.xlabel("t (months)")
    plt.ylabel("h_t")
    plt.legend()
    plt.tight_layout()
    plt.show()

plot_paths(res_store["stress: +200bp parallel"], "LQ benchmark under +200bp parallel shock")
plot_paths(res_store["stress: bear steepener"], "LQ benchmark under bear steepener")
plot_paths(res_store["stress: high vol x3"], "LQ benchmark under high-vol regime")
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Part IV — Reinforcement Learning¶

Beyond Quadratic Costs: L1 Transaction Costs and the Transition to Reinforcement Learning¶

The linear–quadratic (LQ) control framework provides a powerful and interpretable benchmark for dynamic hedging when costs are quadratic. However, real-world hedging problems often involve non-quadratic frictions, most notably transaction costs that scale linearly with trade size.

This section explains how introducing L1 costs fundamentally changes the control problem and motivates the use of reinforcement learning.


Economic Motivation for L1 Transaction Costs¶

Quadratic trading costs imply that:

  • small trades are almost free,
  • frequent rebalancing is optimal,
  • hedge adjustments are smooth and continuous.

In practice, interest rate hedging instruments (IRS, FRA, swaps) are subject to:

  • bid–ask spreads,
  • brokerage fees,
  • balance-sheet and operational costs.

These costs are better approximated by linear (L1) penalties: $ \text{Transaction cost at time } t \;\propto\; |u_t| $

Economically, this means:

  • each trade incurs a fixed marginal cost,
  • small trades are not necessarily cheap,
  • inactivity can be optimal over wide regions of the state space.

The Control Problem with L1 Costs¶

Replacing the quadratic trading penalty with an L1 penalty leads to the objective: $ \min_{u_t} \mathbb{E} \sum_{t=0}^{\infty} \left( \text{NII}_{t+1}^2 + \lambda_h h_t^2 + \kappa_u |u_t| \right) $

Key differences from the LQ case:

  • the cost function is non-differentiable at $u_t = 0$,
  • the value function is no longer quadratic,
  • the optimal policy is no longer linear in the state.

As a result, classical LQ theory no longer applies.


What Breaks in Classical Optimal Control¶

Under L1 costs:

  • the Riccati equation cannot be used,
  • there is no closed-form optimal feedback matrix,
  • certainty equivalence fails.

Most importantly:

The optimal policy develops endogenous “no-trade regions.”

That is, there exist states where: $ u_t^\star = 0 $ even though the hedge is imperfect.

This behavior is well known in impulse control and inventory management problems, but it cannot be represented by linear feedback rules.


State-Space Dynamics Remain Valid¶

Crucially, introducing L1 costs does not invalidate the state-space model.

The dynamics remain: $ \mathbf{s}_{t+1} = A \mathbf{s}_t + B u_t + \boldsymbol{\xi}_{t+1} $

where:

  • $\mathbf{s}_t = (L_t, S_t, C_t, h_t)$,
  • yield-curve factors evolve exogenously under AFNS dynamics,
  • control affects only the hedge inventory.

What changes is not the system, but the optimization problem defined on top of it.


Why Reinforcement Learning Is Appropriate¶

Reinforcement learning (RL) solves dynamic decision problems by:

  • interacting with the environment,
  • learning value functions or policies directly,
  • without requiring smoothness or quadratic structure.

In this project, RL is applied to:

  • the same AFNS-based state-space system,
  • the same NII definition,
  • but with an objective that includes L1 transaction costs.

This allows the agent to:

  • learn sparse trading policies,
  • internalize fixed trading frictions,
  • and optimally balance NII risk against trading intensity.

Relationship Between LQ Control and RL¶

The L1-RL formulation can be viewed as a generalization of LQ control:

  • When transaction costs are quadratic, the optimal RL policy converges toward the LQ solution.
  • When costs are linear, RL departs from linear feedback and learns non-smooth, state-dependent rules.

Thus:

RL does not replace classical control. It extends it to settings where classical assumptions break down.


Role in This Project¶

The comparison between LQ control and RL serves a clear purpose:

  • LQ control defines the optimal benchmark under idealized quadratic costs.
  • RL with L1 costs captures realistic trading frictions and balance-sheet considerations.
  • The performance gap between the two highlights the economic impact of non-quadratic costs.

This framework makes it possible to assess when and why more flexible decision rules are required in interest rate risk management.


Conceptual Summary¶

  • The AFNS model provides a coherent state-space environment.
  • LQ control is optimal under quadratic costs and serves as a theoretical benchmark.
  • L1 transaction costs break the assumptions underlying LQ theory.
  • Reinforcement learning naturally handles non-smooth objectives and sparse actions.
  • Comparing LQ and RL policies reveals the economic consequences of realistic trading frictions.

In this sense, reinforcement learning appears not as a black-box alternative, but as the appropriate solution once the structure of the control problem changes.

In [90]:
import gymnasium as gym
from gymnasium import spaces

class IrrbbNiiHedgeEnv(gym.Env):
    """
    RL environment for IRRBB-style NII hedging using a FRA hedge.

    State: [L, S, C, h] (optionally centered around X_ref)
    Action: u = Δh (hedge adjustment), continuous and bounded
    Dynamics:
      X_{t+1} = c + Phi X_t + eps_t, eps_t ~ N(0, Sigma)
      h_{t+1} = h_t + u_t

    Reward (to maximize):
      r_t = - [ NII_{t+1}^2 + lambda_h*h_t^2 + lambda_u*u_t^2 ]
    """
    metadata = {"render_modes": []}

    def __init__(self,
                 cm,
                 c_hat, Phi_hat, Sigma_hat, X_ref,
                 A_notional=100.0, L_notional=100.0,
                 tau_A=3.0, tau_L=1.0, dt=1/12,
                 lambda_u=1e-3, lambda_h=1e-6,
                 action_max=1.0,
                 episode_len=240,
                 center_state=True,
                 seed=123):
        super().__init__()

        self.cm = cm
        self.c = np.asarray(c_hat, float)
        self.Phi = np.asarray(Phi_hat, float)
        self.Sigma = np.asarray(Sigma_hat, float)
        self.X_ref = np.asarray(X_ref, float)

        self.A_notional = float(A_notional)
        self.L_notional = float(L_notional)
        self.tau_A = float(tau_A)
        self.tau_L = float(tau_L)
        self.dt = float(dt)

        self.lambda_u = float(lambda_u)
        self.lambda_h = float(lambda_h)

        self.action_max = float(action_max)
        self.episode_len = int(episode_len)
        self.center_state = bool(center_state)

        self.rng = np.random.default_rng(seed)

        # Observation: 4D continuous
        obs_high = np.full((4,), np.inf, dtype=np.float32)
        self.observation_space = spaces.Box(low=-obs_high, high=obs_high, dtype=np.float32)

        # Action: 1D continuous Δh bounded
        self.action_space = spaces.Box(low=-self.action_max, high=self.action_max, shape=(1,), dtype=np.float32)

        self.t = 0
        self.X = None
        self.h = None

    # --- AFNS helpers ---
    def _y(self, X, tau):
        taus = np.array([tau], dtype=float)
        return float(afns_yields_from_factors(X, taus, self.cm.lam, self.cm.sig1, self.cm.sig2, self.cm.sig3)[0])

    def _fwd_cc(self, X, tau1, tau2):
        y1 = self._y(X, tau1)
        y2 = self._y(X, tau2)
        return (tau2 * y2 - tau1 * y1) / (tau2 - tau1)

    def _nii_one_step(self, X_t, X_next, h_t):
        # Base repricing NII (simplified)
        yA = self._y(X_t, self.tau_A)
        yL = self._y(X_t, self.tau_L)

        # FRA fixed rate: forward from tau_L to tau_L+dt
        K_t = self._fwd_cc(X_t, self.tau_L, self.tau_L + self.dt)
        y_float_next = self._y(X_next, self.tau_L)

        return (self.A_notional * yA * self.dt
                - self.L_notional * yL * self.dt
                + h_t * (K_t - y_float_next) * self.dt)

    def _obs(self):
        x = np.array([self.X[0], self.X[1], self.X[2], self.h], dtype=np.float32)
        if self.center_state:
            x[:3] = x[:3] - self.X_ref.astype(np.float32)
        return x

    def reset(self, *, seed=None, options=None):
        if seed is not None:
            self.rng = np.random.default_rng(seed)

        self.t = 0

        # Start near reference with small randomization
        X0 = self.X_ref + self.rng.multivariate_normal(np.zeros(3), 0.1 * self.Sigma)
        self.X = np.asarray(X0, float)
        self.h = 0.0

        return self._obs(), {}

    def step(self, action):
        action = np.asarray(action, dtype=float)
        u = float(np.clip(action[0], -self.action_max, self.action_max))

        # Next factors
        eps = self.rng.multivariate_normal(np.zeros(3), self.Sigma)
        X_next = self.c + self.Phi @ self.X + eps

        # NII uses h_t (pre-trade) in our convention (common in discrete-time setups)
        nii = self._nii_one_step(self.X, X_next, self.h)

        # Update hedge
        h_next = self.h + u

        # Reward: negative quadratic "cost"
        reward = -(nii**2 + self.lambda_h * (self.h**2) + self.lambda_u * (u**2))

        # Transition
        self.X = X_next
        self.h = h_next
        self.t += 1

        terminated = False
        truncated = (self.t >= self.episode_len)

        info = {"nii": nii, "h": self.h, "u": u}

        return self._obs(), float(reward), terminated, truncated, info

In the baseline linear–quadratic setting, classical LQ control has a structural advantage:

  • linear dynamics,
  • quadratic objective,
  • continuous actions.

So it is expected to be near-optimal.

RL becomes compelling when we introduce realistic features that break LQ assumptions, such as:

  • non-quadratic transaction costs (L1),
  • no-trade regions / discrete trading,
  • regime switching or nonlinearities,
  • tail-risk objectives.

We start by implementing an RL agent in the same environment, then extend the objective to L1 costs where RL has a fair advantage.

In [91]:
from stable_baselines3 import SAC
from stable_baselines3.common.env_util import make_vec_env

np.random.seed(123)

# Fit VAR(1) once
c_hat, Phi_hat, Sigma_hat = fit_var1(X_smooth)
X_ref = X_smooth.mean(axis=0)

# Use the same lambdas previously selected for LQ benchmark
lambda_u_rl = float(lambda_u_star)
lambda_h_rl = float(lambda_h_star)

def make_env():
    return IrrbbNiiHedgeEnv(
        cm=cm,
        c_hat=c_hat, Phi_hat=Phi_hat, Sigma_hat=Sigma_hat, X_ref=X_ref,
        A_notional=100.0, L_notional=100.0,
        tau_A=3.0, tau_L=1.0, dt=1/12,
        lambda_u=lambda_u_rl, lambda_h=lambda_h_rl,
        action_max=50,
        episode_len=240,          # 20 years monthly
        center_state=True,
        seed=123
    )

vec_env = make_vec_env(make_env, n_envs=8)  # parallel rollouts

model = SAC(
    "MlpPolicy",
    vec_env,
    verbose=1,
    learning_rate=3e-4,
    batch_size=256,
    buffer_size=200_000,
    train_freq=1,
    gradient_steps=1,
    gamma=0.99,
    tau=0.005,
)

model.learn(total_timesteps=300_000)

# Save if you want
model.save("sac_irrbb_nii_hedge")
Using cpu device
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -364     |
| time/              |          |
|    episodes        | 4        |
|    fps             | 160      |
|    time_elapsed    | 11       |
|    total_timesteps | 1920     |
| train/             |          |
|    actor_loss      | 13.9     |
|    critic_loss     | 7.2      |
|    ent_coef        | 1.03     |
|    ent_coef_loss   | 0.0366   |
|    learning_rate   | 0.0003   |
|    n_updates       | 227      |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -364     |
| time/              |          |
|    episodes        | 8        |
|    fps             | 159      |
|    time_elapsed    | 12       |
|    total_timesteps | 1920     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -328     |
| time/              |          |
|    episodes        | 12       |
|    fps             | 149      |
|    time_elapsed    | 25       |
|    total_timesteps | 3840     |
| train/             |          |
|    actor_loss      | 13.3     |
|    critic_loss     | 8.89     |
|    ent_coef        | 1.05     |
|    ent_coef_loss   | 0.0462   |
|    learning_rate   | 0.0003   |
|    n_updates       | 467      |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -328     |
| time/              |          |
|    episodes        | 16       |
|    fps             | 149      |
|    time_elapsed    | 25       |
|    total_timesteps | 3840     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -287     |
| time/              |          |
|    episodes        | 20       |
|    fps             | 149      |
|    time_elapsed    | 38       |
|    total_timesteps | 5760     |
| train/             |          |
|    actor_loss      | 11.4     |
|    critic_loss     | 5.19     |
|    ent_coef        | 0.968    |
|    ent_coef_loss   | -0.0485  |
|    learning_rate   | 0.0003   |
|    n_updates       | 707      |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -287     |
| time/              |          |
|    episodes        | 24       |
|    fps             | 149      |
|    time_elapsed    | 38       |
|    total_timesteps | 5760     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -265     |
| time/              |          |
|    episodes        | 28       |
|    fps             | 151      |
|    time_elapsed    | 50       |
|    total_timesteps | 7680     |
| train/             |          |
|    actor_loss      | 9.16     |
|    critic_loss     | 3.4      |
|    ent_coef        | 0.893    |
|    ent_coef_loss   | -0.166   |
|    learning_rate   | 0.0003   |
|    n_updates       | 947      |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -265     |
| time/              |          |
|    episodes        | 32       |
|    fps             | 151      |
|    time_elapsed    | 50       |
|    total_timesteps | 7680     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -249     |
| time/              |          |
|    episodes        | 36       |
|    fps             | 151      |
|    time_elapsed    | 63       |
|    total_timesteps | 9600     |
| train/             |          |
|    actor_loss      | 8.58     |
|    critic_loss     | 1.87     |
|    ent_coef        | 0.83     |
|    ent_coef_loss   | -0.261   |
|    learning_rate   | 0.0003   |
|    n_updates       | 1187     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -249     |
| time/              |          |
|    episodes        | 40       |
|    fps             | 151      |
|    time_elapsed    | 63       |
|    total_timesteps | 9600     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -237     |
| time/              |          |
|    episodes        | 44       |
|    fps             | 149      |
|    time_elapsed    | 77       |
|    total_timesteps | 11520    |
| train/             |          |
|    actor_loss      | 6.35     |
|    critic_loss     | 0.766    |
|    ent_coef        | 0.772    |
|    ent_coef_loss   | -0.313   |
|    learning_rate   | 0.0003   |
|    n_updates       | 1427     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -237     |
| time/              |          |
|    episodes        | 48       |
|    fps             | 149      |
|    time_elapsed    | 77       |
|    total_timesteps | 11520    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -223     |
| time/              |          |
|    episodes        | 52       |
|    fps             | 147      |
|    time_elapsed    | 91       |
|    total_timesteps | 13440    |
| train/             |          |
|    actor_loss      | 7.99     |
|    critic_loss     | 1.25     |
|    ent_coef        | 0.722    |
|    ent_coef_loss   | -0.44    |
|    learning_rate   | 0.0003   |
|    n_updates       | 1667     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -223     |
| time/              |          |
|    episodes        | 56       |
|    fps             | 147      |
|    time_elapsed    | 91       |
|    total_timesteps | 13440    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -209     |
| time/              |          |
|    episodes        | 60       |
|    fps             | 146      |
|    time_elapsed    | 105      |
|    total_timesteps | 15360    |
| train/             |          |
|    actor_loss      | 6.91     |
|    critic_loss     | 0.428    |
|    ent_coef        | 0.68     |
|    ent_coef_loss   | -0.337   |
|    learning_rate   | 0.0003   |
|    n_updates       | 1907     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -209     |
| time/              |          |
|    episodes        | 64       |
|    fps             | 146      |
|    time_elapsed    | 105      |
|    total_timesteps | 15360    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -194     |
| time/              |          |
|    episodes        | 68       |
|    fps             | 146      |
|    time_elapsed    | 117      |
|    total_timesteps | 17280    |
| train/             |          |
|    actor_loss      | 4.66     |
|    critic_loss     | 0.27     |
|    ent_coef        | 0.64     |
|    ent_coef_loss   | -0.53    |
|    learning_rate   | 0.0003   |
|    n_updates       | 2147     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -194     |
| time/              |          |
|    episodes        | 72       |
|    fps             | 146      |
|    time_elapsed    | 117      |
|    total_timesteps | 17280    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -181     |
| time/              |          |
|    episodes        | 76       |
|    fps             | 147      |
|    time_elapsed    | 130      |
|    total_timesteps | 19200    |
| train/             |          |
|    actor_loss      | 6.84     |
|    critic_loss     | 0.634    |
|    ent_coef        | 0.604    |
|    ent_coef_loss   | -0.172   |
|    learning_rate   | 0.0003   |
|    n_updates       | 2387     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -181     |
| time/              |          |
|    episodes        | 80       |
|    fps             | 147      |
|    time_elapsed    | 130      |
|    total_timesteps | 19200    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -169     |
| time/              |          |
|    episodes        | 84       |
|    fps             | 147      |
|    time_elapsed    | 143      |
|    total_timesteps | 21120    |
| train/             |          |
|    actor_loss      | 5.26     |
|    critic_loss     | 0.668    |
|    ent_coef        | 0.571    |
|    ent_coef_loss   | -0.357   |
|    learning_rate   | 0.0003   |
|    n_updates       | 2627     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -169     |
| time/              |          |
|    episodes        | 88       |
|    fps             | 147      |
|    time_elapsed    | 143      |
|    total_timesteps | 21120    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -159     |
| time/              |          |
|    episodes        | 92       |
|    fps             | 146      |
|    time_elapsed    | 156      |
|    total_timesteps | 23040    |
| train/             |          |
|    actor_loss      | 4.84     |
|    critic_loss     | 0.262    |
|    ent_coef        | 0.537    |
|    ent_coef_loss   | -0.665   |
|    learning_rate   | 0.0003   |
|    n_updates       | 2867     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -159     |
| time/              |          |
|    episodes        | 96       |
|    fps             | 146      |
|    time_elapsed    | 156      |
|    total_timesteps | 23040    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -143     |
| time/              |          |
|    episodes        | 100      |
|    fps             | 146      |
|    time_elapsed    | 170      |
|    total_timesteps | 24960    |
| train/             |          |
|    actor_loss      | 5.05     |
|    critic_loss     | 0.324    |
|    ent_coef        | 0.507    |
|    ent_coef_loss   | -0.545   |
|    learning_rate   | 0.0003   |
|    n_updates       | 3107     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -143     |
| time/              |          |
|    episodes        | 104      |
|    fps             | 146      |
|    time_elapsed    | 170      |
|    total_timesteps | 24960    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -119     |
| time/              |          |
|    episodes        | 108      |
|    fps             | 146      |
|    time_elapsed    | 182      |
|    total_timesteps | 26880    |
| train/             |          |
|    actor_loss      | 5.13     |
|    critic_loss     | 0.181    |
|    ent_coef        | 0.476    |
|    ent_coef_loss   | -0.771   |
|    learning_rate   | 0.0003   |
|    n_updates       | 3347     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -119     |
| time/              |          |
|    episodes        | 112      |
|    fps             | 146      |
|    time_elapsed    | 182      |
|    total_timesteps | 26880    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -103     |
| time/              |          |
|    episodes        | 116      |
|    fps             | 147      |
|    time_elapsed    | 195      |
|    total_timesteps | 28800    |
| train/             |          |
|    actor_loss      | 5.3      |
|    critic_loss     | 1.33     |
|    ent_coef        | 0.447    |
|    ent_coef_loss   | -0.688   |
|    learning_rate   | 0.0003   |
|    n_updates       | 3587     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -103     |
| time/              |          |
|    episodes        | 120      |
|    fps             | 147      |
|    time_elapsed    | 195      |
|    total_timesteps | 28800    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -90.7    |
| time/              |          |
|    episodes        | 124      |
|    fps             | 147      |
|    time_elapsed    | 208      |
|    total_timesteps | 30720    |
| train/             |          |
|    actor_loss      | 5.13     |
|    critic_loss     | 0.232    |
|    ent_coef        | 0.418    |
|    ent_coef_loss   | -0.789   |
|    learning_rate   | 0.0003   |
|    n_updates       | 3827     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -90.7    |
| time/              |          |
|    episodes        | 128      |
|    fps             | 147      |
|    time_elapsed    | 208      |
|    total_timesteps | 30720    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -79.1    |
| time/              |          |
|    episodes        | 132      |
|    fps             | 146      |
|    time_elapsed    | 222      |
|    total_timesteps | 32640    |
| train/             |          |
|    actor_loss      | 4.55     |
|    critic_loss     | 0.394    |
|    ent_coef        | 0.392    |
|    ent_coef_loss   | -0.777   |
|    learning_rate   | 0.0003   |
|    n_updates       | 4067     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -79.1    |
| time/              |          |
|    episodes        | 136      |
|    fps             | 146      |
|    time_elapsed    | 222      |
|    total_timesteps | 32640    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -67.7    |
| time/              |          |
|    episodes        | 140      |
|    fps             | 147      |
|    time_elapsed    | 234      |
|    total_timesteps | 34560    |
| train/             |          |
|    actor_loss      | 6.2      |
|    critic_loss     | 0.201    |
|    ent_coef        | 0.366    |
|    ent_coef_loss   | -0.156   |
|    learning_rate   | 0.0003   |
|    n_updates       | 4307     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -67.7    |
| time/              |          |
|    episodes        | 144      |
|    fps             | 147      |
|    time_elapsed    | 234      |
|    total_timesteps | 34560    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -58.2    |
| time/              |          |
|    episodes        | 148      |
|    fps             | 146      |
|    time_elapsed    | 248      |
|    total_timesteps | 36480    |
| train/             |          |
|    actor_loss      | 6.65     |
|    critic_loss     | 0.47     |
|    ent_coef        | 0.344    |
|    ent_coef_loss   | -0.55    |
|    learning_rate   | 0.0003   |
|    n_updates       | 4547     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -58.2    |
| time/              |          |
|    episodes        | 152      |
|    fps             | 146      |
|    time_elapsed    | 249      |
|    total_timesteps | 36480    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -51.5    |
| time/              |          |
|    episodes        | 156      |
|    fps             | 146      |
|    time_elapsed    | 262      |
|    total_timesteps | 38400    |
| train/             |          |
|    actor_loss      | 6.42     |
|    critic_loss     | 0.444    |
|    ent_coef        | 0.322    |
|    ent_coef_loss   | -1.1     |
|    learning_rate   | 0.0003   |
|    n_updates       | 4787     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -51.5    |
| time/              |          |
|    episodes        | 160      |
|    fps             | 146      |
|    time_elapsed    | 262      |
|    total_timesteps | 38400    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -46.8    |
| time/              |          |
|    episodes        | 164      |
|    fps             | 145      |
|    time_elapsed    | 276      |
|    total_timesteps | 40320    |
| train/             |          |
|    actor_loss      | 5.39     |
|    critic_loss     | 0.0763   |
|    ent_coef        | 0.303    |
|    ent_coef_loss   | -1.02    |
|    learning_rate   | 0.0003   |
|    n_updates       | 5027     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -46.8    |
| time/              |          |
|    episodes        | 168      |
|    fps             | 145      |
|    time_elapsed    | 276      |
|    total_timesteps | 40320    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -44.4    |
| time/              |          |
|    episodes        | 172      |
|    fps             | 145      |
|    time_elapsed    | 290      |
|    total_timesteps | 42240    |
| train/             |          |
|    actor_loss      | 5.65     |
|    critic_loss     | 0.192    |
|    ent_coef        | 0.284    |
|    ent_coef_loss   | -1.1     |
|    learning_rate   | 0.0003   |
|    n_updates       | 5267     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -44.4    |
| time/              |          |
|    episodes        | 176      |
|    fps             | 145      |
|    time_elapsed    | 290      |
|    total_timesteps | 42240    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -42.6    |
| time/              |          |
|    episodes        | 180      |
|    fps             | 144      |
|    time_elapsed    | 304      |
|    total_timesteps | 44160    |
| train/             |          |
|    actor_loss      | 5.64     |
|    critic_loss     | 0.102    |
|    ent_coef        | 0.266    |
|    ent_coef_loss   | -1.18    |
|    learning_rate   | 0.0003   |
|    n_updates       | 5507     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -42.6    |
| time/              |          |
|    episodes        | 184      |
|    fps             | 144      |
|    time_elapsed    | 304      |
|    total_timesteps | 44160    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -41.2    |
| time/              |          |
|    episodes        | 188      |
|    fps             | 144      |
|    time_elapsed    | 318      |
|    total_timesteps | 46080    |
| train/             |          |
|    actor_loss      | 6.13     |
|    critic_loss     | 0.219    |
|    ent_coef        | 0.249    |
|    ent_coef_loss   | -0.913   |
|    learning_rate   | 0.0003   |
|    n_updates       | 5747     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -41.2    |
| time/              |          |
|    episodes        | 192      |
|    fps             | 144      |
|    time_elapsed    | 318      |
|    total_timesteps | 46080    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -39.8    |
| time/              |          |
|    episodes        | 196      |
|    fps             | 144      |
|    time_elapsed    | 332      |
|    total_timesteps | 48000    |
| train/             |          |
|    actor_loss      | 6.32     |
|    critic_loss     | 0.114    |
|    ent_coef        | 0.234    |
|    ent_coef_loss   | -1.18    |
|    learning_rate   | 0.0003   |
|    n_updates       | 5987     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -39.8    |
| time/              |          |
|    episodes        | 200      |
|    fps             | 144      |
|    time_elapsed    | 332      |
|    total_timesteps | 48000    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -38.4    |
| time/              |          |
|    episodes        | 204      |
|    fps             | 144      |
|    time_elapsed    | 346      |
|    total_timesteps | 49920    |
| train/             |          |
|    actor_loss      | 5.89     |
|    critic_loss     | 0.0893   |
|    ent_coef        | 0.219    |
|    ent_coef_loss   | -1.14    |
|    learning_rate   | 0.0003   |
|    n_updates       | 6227     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -38.4    |
| time/              |          |
|    episodes        | 208      |
|    fps             | 144      |
|    time_elapsed    | 346      |
|    total_timesteps | 49920    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -36.9    |
| time/              |          |
|    episodes        | 212      |
|    fps             | 143      |
|    time_elapsed    | 360      |
|    total_timesteps | 51840    |
| train/             |          |
|    actor_loss      | 5.78     |
|    critic_loss     | 0.0792   |
|    ent_coef        | 0.205    |
|    ent_coef_loss   | -1.21    |
|    learning_rate   | 0.0003   |
|    n_updates       | 6467     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -36.9    |
| time/              |          |
|    episodes        | 216      |
|    fps             | 143      |
|    time_elapsed    | 360      |
|    total_timesteps | 51840    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -35.2    |
| time/              |          |
|    episodes        | 220      |
|    fps             | 143      |
|    time_elapsed    | 373      |
|    total_timesteps | 53760    |
| train/             |          |
|    actor_loss      | 6.01     |
|    critic_loss     | 0.0382   |
|    ent_coef        | 0.192    |
|    ent_coef_loss   | -0.709   |
|    learning_rate   | 0.0003   |
|    n_updates       | 6707     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -35.2    |
| time/              |          |
|    episodes        | 224      |
|    fps             | 143      |
|    time_elapsed    | 373      |
|    total_timesteps | 53760    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -33.9    |
| time/              |          |
|    episodes        | 228      |
|    fps             | 143      |
|    time_elapsed    | 387      |
|    total_timesteps | 55680    |
| train/             |          |
|    actor_loss      | 7.14     |
|    critic_loss     | 0.104    |
|    ent_coef        | 0.181    |
|    ent_coef_loss   | 0.248    |
|    learning_rate   | 0.0003   |
|    n_updates       | 6947     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -33.9    |
| time/              |          |
|    episodes        | 232      |
|    fps             | 143      |
|    time_elapsed    | 387      |
|    total_timesteps | 55680    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -32.4    |
| time/              |          |
|    episodes        | 236      |
|    fps             | 143      |
|    time_elapsed    | 401      |
|    total_timesteps | 57600    |
| train/             |          |
|    actor_loss      | 6.59     |
|    critic_loss     | 0.334    |
|    ent_coef        | 0.17     |
|    ent_coef_loss   | -1.38    |
|    learning_rate   | 0.0003   |
|    n_updates       | 7187     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -32.4    |
| time/              |          |
|    episodes        | 240      |
|    fps             | 143      |
|    time_elapsed    | 401      |
|    total_timesteps | 57600    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -30.8    |
| time/              |          |
|    episodes        | 244      |
|    fps             | 143      |
|    time_elapsed    | 415      |
|    total_timesteps | 59520    |
| train/             |          |
|    actor_loss      | 6.44     |
|    critic_loss     | 0.0724   |
|    ent_coef        | 0.16     |
|    ent_coef_loss   | -1.05    |
|    learning_rate   | 0.0003   |
|    n_updates       | 7427     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -30.8    |
| time/              |          |
|    episodes        | 248      |
|    fps             | 143      |
|    time_elapsed    | 415      |
|    total_timesteps | 59520    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -29.2    |
| time/              |          |
|    episodes        | 252      |
|    fps             | 143      |
|    time_elapsed    | 429      |
|    total_timesteps | 61440    |
| train/             |          |
|    actor_loss      | 6.39     |
|    critic_loss     | 0.0203   |
|    ent_coef        | 0.151    |
|    ent_coef_loss   | -1.17    |
|    learning_rate   | 0.0003   |
|    n_updates       | 7667     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -29.2    |
| time/              |          |
|    episodes        | 256      |
|    fps             | 143      |
|    time_elapsed    | 429      |
|    total_timesteps | 61440    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -27.7    |
| time/              |          |
|    episodes        | 260      |
|    fps             | 142      |
|    time_elapsed    | 443      |
|    total_timesteps | 63360    |
| train/             |          |
|    actor_loss      | 6.2      |
|    critic_loss     | 0.0613   |
|    ent_coef        | 0.147    |
|    ent_coef_loss   | -1.09    |
|    learning_rate   | 0.0003   |
|    n_updates       | 7907     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -27.7    |
| time/              |          |
|    episodes        | 264      |
|    fps             | 142      |
|    time_elapsed    | 443      |
|    total_timesteps | 63360    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -26.3    |
| time/              |          |
|    episodes        | 268      |
|    fps             | 142      |
|    time_elapsed    | 457      |
|    total_timesteps | 65280    |
| train/             |          |
|    actor_loss      | 6.47     |
|    critic_loss     | 0.0402   |
|    ent_coef        | 0.139    |
|    ent_coef_loss   | -0.771   |
|    learning_rate   | 0.0003   |
|    n_updates       | 8147     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -26.3    |
| time/              |          |
|    episodes        | 272      |
|    fps             | 142      |
|    time_elapsed    | 457      |
|    total_timesteps | 65280    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -24.9    |
| time/              |          |
|    episodes        | 276      |
|    fps             | 142      |
|    time_elapsed    | 471      |
|    total_timesteps | 67200    |
| train/             |          |
|    actor_loss      | 6.97     |
|    critic_loss     | 0.0437   |
|    ent_coef        | 0.131    |
|    ent_coef_loss   | -0.15    |
|    learning_rate   | 0.0003   |
|    n_updates       | 8387     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -24.9    |
| time/              |          |
|    episodes        | 280      |
|    fps             | 142      |
|    time_elapsed    | 471      |
|    total_timesteps | 67200    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -23.5    |
| time/              |          |
|    episodes        | 284      |
|    fps             | 142      |
|    time_elapsed    | 485      |
|    total_timesteps | 69120    |
| train/             |          |
|    actor_loss      | 6.79     |
|    critic_loss     | 0.0115   |
|    ent_coef        | 0.124    |
|    ent_coef_loss   | -0.108   |
|    learning_rate   | 0.0003   |
|    n_updates       | 8627     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -23.5    |
| time/              |          |
|    episodes        | 288      |
|    fps             | 142      |
|    time_elapsed    | 485      |
|    total_timesteps | 69120    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -22.2    |
| time/              |          |
|    episodes        | 292      |
|    fps             | 142      |
|    time_elapsed    | 499      |
|    total_timesteps | 71040    |
| train/             |          |
|    actor_loss      | 6.79     |
|    critic_loss     | 0.0209   |
|    ent_coef        | 0.118    |
|    ent_coef_loss   | -1.2     |
|    learning_rate   | 0.0003   |
|    n_updates       | 8867     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -22.2    |
| time/              |          |
|    episodes        | 296      |
|    fps             | 142      |
|    time_elapsed    | 499      |
|    total_timesteps | 71040    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -20.9    |
| time/              |          |
|    episodes        | 300      |
|    fps             | 142      |
|    time_elapsed    | 513      |
|    total_timesteps | 72960    |
| train/             |          |
|    actor_loss      | 7.05     |
|    critic_loss     | 0.0253   |
|    ent_coef        | 0.111    |
|    ent_coef_loss   | -0.654   |
|    learning_rate   | 0.0003   |
|    n_updates       | 9107     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -20.9    |
| time/              |          |
|    episodes        | 304      |
|    fps             | 142      |
|    time_elapsed    | 513      |
|    total_timesteps | 72960    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -19.7    |
| time/              |          |
|    episodes        | 308      |
|    fps             | 142      |
|    time_elapsed    | 527      |
|    total_timesteps | 74880    |
| train/             |          |
|    actor_loss      | 6.78     |
|    critic_loss     | 0.0151   |
|    ent_coef        | 0.106    |
|    ent_coef_loss   | -0.997   |
|    learning_rate   | 0.0003   |
|    n_updates       | 9347     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -19.7    |
| time/              |          |
|    episodes        | 312      |
|    fps             | 142      |
|    time_elapsed    | 527      |
|    total_timesteps | 74880    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -18.7    |
| time/              |          |
|    episodes        | 316      |
|    fps             | 141      |
|    time_elapsed    | 540      |
|    total_timesteps | 76800    |
| train/             |          |
|    actor_loss      | 7.12     |
|    critic_loss     | 0.122    |
|    ent_coef        | 0.101    |
|    ent_coef_loss   | -0.565   |
|    learning_rate   | 0.0003   |
|    n_updates       | 9587     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -18.7    |
| time/              |          |
|    episodes        | 320      |
|    fps             | 141      |
|    time_elapsed    | 540      |
|    total_timesteps | 76800    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -17.7    |
| time/              |          |
|    episodes        | 324      |
|    fps             | 141      |
|    time_elapsed    | 555      |
|    total_timesteps | 78720    |
| train/             |          |
|    actor_loss      | 7        |
|    critic_loss     | 0.0235   |
|    ent_coef        | 0.096    |
|    ent_coef_loss   | -0.577   |
|    learning_rate   | 0.0003   |
|    n_updates       | 9827     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -17.7    |
| time/              |          |
|    episodes        | 328      |
|    fps             | 141      |
|    time_elapsed    | 555      |
|    total_timesteps | 78720    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -16.9    |
| time/              |          |
|    episodes        | 332      |
|    fps             | 141      |
|    time_elapsed    | 569      |
|    total_timesteps | 80640    |
| train/             |          |
|    actor_loss      | 7.03     |
|    critic_loss     | 0.0226   |
|    ent_coef        | 0.0925   |
|    ent_coef_loss   | -0.775   |
|    learning_rate   | 0.0003   |
|    n_updates       | 10067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -16.9    |
| time/              |          |
|    episodes        | 336      |
|    fps             | 141      |
|    time_elapsed    | 569      |
|    total_timesteps | 80640    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -16      |
| time/              |          |
|    episodes        | 340      |
|    fps             | 141      |
|    time_elapsed    | 584      |
|    total_timesteps | 82560    |
| train/             |          |
|    actor_loss      | 7.2      |
|    critic_loss     | 0.0333   |
|    ent_coef        | 0.0903   |
|    ent_coef_loss   | 0.455    |
|    learning_rate   | 0.0003   |
|    n_updates       | 10307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -16      |
| time/              |          |
|    episodes        | 344      |
|    fps             | 141      |
|    time_elapsed    | 584      |
|    total_timesteps | 82560    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -15.2    |
| time/              |          |
|    episodes        | 348      |
|    fps             | 141      |
|    time_elapsed    | 597      |
|    total_timesteps | 84480    |
| train/             |          |
|    actor_loss      | 7.24     |
|    critic_loss     | 0.112    |
|    ent_coef        | 0.0884   |
|    ent_coef_loss   | 0.627    |
|    learning_rate   | 0.0003   |
|    n_updates       | 10547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -15.2    |
| time/              |          |
|    episodes        | 352      |
|    fps             | 141      |
|    time_elapsed    | 597      |
|    total_timesteps | 84480    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -14.6    |
| time/              |          |
|    episodes        | 356      |
|    fps             | 141      |
|    time_elapsed    | 610      |
|    total_timesteps | 86400    |
| train/             |          |
|    actor_loss      | 7.05     |
|    critic_loss     | 0.00427  |
|    ent_coef        | 0.0848   |
|    ent_coef_loss   | -0.756   |
|    learning_rate   | 0.0003   |
|    n_updates       | 10787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -14.6    |
| time/              |          |
|    episodes        | 360      |
|    fps             | 141      |
|    time_elapsed    | 610      |
|    total_timesteps | 86400    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -14      |
| time/              |          |
|    episodes        | 364      |
|    fps             | 141      |
|    time_elapsed    | 623      |
|    total_timesteps | 88320    |
| train/             |          |
|    actor_loss      | 7.23     |
|    critic_loss     | 0.0206   |
|    ent_coef        | 0.0808   |
|    ent_coef_loss   | 0.518    |
|    learning_rate   | 0.0003   |
|    n_updates       | 11027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -14      |
| time/              |          |
|    episodes        | 368      |
|    fps             | 141      |
|    time_elapsed    | 623      |
|    total_timesteps | 88320    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -13.3    |
| time/              |          |
|    episodes        | 372      |
|    fps             | 141      |
|    time_elapsed    | 636      |
|    total_timesteps | 90240    |
| train/             |          |
|    actor_loss      | 7.26     |
|    critic_loss     | 0.0166   |
|    ent_coef        | 0.0781   |
|    ent_coef_loss   | -0.659   |
|    learning_rate   | 0.0003   |
|    n_updates       | 11267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -13.3    |
| time/              |          |
|    episodes        | 376      |
|    fps             | 141      |
|    time_elapsed    | 636      |
|    total_timesteps | 90240    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -12.7    |
| time/              |          |
|    episodes        | 380      |
|    fps             | 141      |
|    time_elapsed    | 649      |
|    total_timesteps | 92160    |
| train/             |          |
|    actor_loss      | 7.15     |
|    critic_loss     | 0.0058   |
|    ent_coef        | 0.0742   |
|    ent_coef_loss   | -0.544   |
|    learning_rate   | 0.0003   |
|    n_updates       | 11507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -12.7    |
| time/              |          |
|    episodes        | 384      |
|    fps             | 141      |
|    time_elapsed    | 649      |
|    total_timesteps | 92160    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -12.2    |
| time/              |          |
|    episodes        | 388      |
|    fps             | 142      |
|    time_elapsed    | 661      |
|    total_timesteps | 94080    |
| train/             |          |
|    actor_loss      | 7.31     |
|    critic_loss     | 0.00965  |
|    ent_coef        | 0.0722   |
|    ent_coef_loss   | -0.68    |
|    learning_rate   | 0.0003   |
|    n_updates       | 11747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -12.2    |
| time/              |          |
|    episodes        | 392      |
|    fps             | 142      |
|    time_elapsed    | 661      |
|    total_timesteps | 94080    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -11.6    |
| time/              |          |
|    episodes        | 396      |
|    fps             | 142      |
|    time_elapsed    | 675      |
|    total_timesteps | 96000    |
| train/             |          |
|    actor_loss      | 7.38     |
|    critic_loss     | 0.0113   |
|    ent_coef        | 0.0705   |
|    ent_coef_loss   | -0.345   |
|    learning_rate   | 0.0003   |
|    n_updates       | 11987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -11.6    |
| time/              |          |
|    episodes        | 400      |
|    fps             | 142      |
|    time_elapsed    | 675      |
|    total_timesteps | 96000    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -11.2    |
| time/              |          |
|    episodes        | 404      |
|    fps             | 142      |
|    time_elapsed    | 687      |
|    total_timesteps | 97920    |
| train/             |          |
|    actor_loss      | 7.48     |
|    critic_loss     | 0.0488   |
|    ent_coef        | 0.0674   |
|    ent_coef_loss   | -0.0736  |
|    learning_rate   | 0.0003   |
|    n_updates       | 12227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -11.2    |
| time/              |          |
|    episodes        | 408      |
|    fps             | 142      |
|    time_elapsed    | 687      |
|    total_timesteps | 97920    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -10.8    |
| time/              |          |
|    episodes        | 412      |
|    fps             | 142      |
|    time_elapsed    | 700      |
|    total_timesteps | 99840    |
| train/             |          |
|    actor_loss      | 7.58     |
|    critic_loss     | 0.069    |
|    ent_coef        | 0.0645   |
|    ent_coef_loss   | -0.686   |
|    learning_rate   | 0.0003   |
|    n_updates       | 12467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -10.8    |
| time/              |          |
|    episodes        | 416      |
|    fps             | 142      |
|    time_elapsed    | 700      |
|    total_timesteps | 99840    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -10.4    |
| time/              |          |
|    episodes        | 420      |
|    fps             | 142      |
|    time_elapsed    | 714      |
|    total_timesteps | 101760   |
| train/             |          |
|    actor_loss      | 7.41     |
|    critic_loss     | 0.0121   |
|    ent_coef        | 0.0643   |
|    ent_coef_loss   | -0.784   |
|    learning_rate   | 0.0003   |
|    n_updates       | 12707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -10.4    |
| time/              |          |
|    episodes        | 424      |
|    fps             | 142      |
|    time_elapsed    | 714      |
|    total_timesteps | 101760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -9.98    |
| time/              |          |
|    episodes        | 428      |
|    fps             | 142      |
|    time_elapsed    | 726      |
|    total_timesteps | 103680   |
| train/             |          |
|    actor_loss      | 7.39     |
|    critic_loss     | 0.629    |
|    ent_coef        | 0.0624   |
|    ent_coef_loss   | 0.353    |
|    learning_rate   | 0.0003   |
|    n_updates       | 12947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -9.98    |
| time/              |          |
|    episodes        | 432      |
|    fps             | 142      |
|    time_elapsed    | 726      |
|    total_timesteps | 103680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -9.67    |
| time/              |          |
|    episodes        | 436      |
|    fps             | 142      |
|    time_elapsed    | 739      |
|    total_timesteps | 105600   |
| train/             |          |
|    actor_loss      | 7.32     |
|    critic_loss     | 0.00688  |
|    ent_coef        | 0.0611   |
|    ent_coef_loss   | -0.114   |
|    learning_rate   | 0.0003   |
|    n_updates       | 13187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -9.67    |
| time/              |          |
|    episodes        | 440      |
|    fps             | 142      |
|    time_elapsed    | 739      |
|    total_timesteps | 105600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -9.26    |
| time/              |          |
|    episodes        | 444      |
|    fps             | 142      |
|    time_elapsed    | 752      |
|    total_timesteps | 107520   |
| train/             |          |
|    actor_loss      | 7.29     |
|    critic_loss     | 0.0128   |
|    ent_coef        | 0.0593   |
|    ent_coef_loss   | 0.132    |
|    learning_rate   | 0.0003   |
|    n_updates       | 13427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -9.26    |
| time/              |          |
|    episodes        | 448      |
|    fps             | 142      |
|    time_elapsed    | 752      |
|    total_timesteps | 107520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -8.92    |
| time/              |          |
|    episodes        | 452      |
|    fps             | 143      |
|    time_elapsed    | 764      |
|    total_timesteps | 109440   |
| train/             |          |
|    actor_loss      | 7.33     |
|    critic_loss     | 0.0143   |
|    ent_coef        | 0.0596   |
|    ent_coef_loss   | -0.279   |
|    learning_rate   | 0.0003   |
|    n_updates       | 13667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -8.92    |
| time/              |          |
|    episodes        | 456      |
|    fps             | 143      |
|    time_elapsed    | 764      |
|    total_timesteps | 109440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -8.55    |
| time/              |          |
|    episodes        | 460      |
|    fps             | 143      |
|    time_elapsed    | 776      |
|    total_timesteps | 111360   |
| train/             |          |
|    actor_loss      | 7.47     |
|    critic_loss     | 0.0226   |
|    ent_coef        | 0.0572   |
|    ent_coef_loss   | -0.39    |
|    learning_rate   | 0.0003   |
|    n_updates       | 13907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -8.55    |
| time/              |          |
|    episodes        | 464      |
|    fps             | 143      |
|    time_elapsed    | 776      |
|    total_timesteps | 111360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -8.28    |
| time/              |          |
|    episodes        | 468      |
|    fps             | 143      |
|    time_elapsed    | 789      |
|    total_timesteps | 113280   |
| train/             |          |
|    actor_loss      | 7.29     |
|    critic_loss     | 0.00283  |
|    ent_coef        | 0.0557   |
|    ent_coef_loss   | -0.129   |
|    learning_rate   | 0.0003   |
|    n_updates       | 14147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -8.28    |
| time/              |          |
|    episodes        | 472      |
|    fps             | 143      |
|    time_elapsed    | 789      |
|    total_timesteps | 113280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -8.05    |
| time/              |          |
|    episodes        | 476      |
|    fps             | 143      |
|    time_elapsed    | 802      |
|    total_timesteps | 115200   |
| train/             |          |
|    actor_loss      | 7.58     |
|    critic_loss     | 0.0118   |
|    ent_coef        | 0.0541   |
|    ent_coef_loss   | 0.0287   |
|    learning_rate   | 0.0003   |
|    n_updates       | 14387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -8.05    |
| time/              |          |
|    episodes        | 480      |
|    fps             | 143      |
|    time_elapsed    | 802      |
|    total_timesteps | 115200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.82    |
| time/              |          |
|    episodes        | 484      |
|    fps             | 143      |
|    time_elapsed    | 815      |
|    total_timesteps | 117120   |
| train/             |          |
|    actor_loss      | 7.46     |
|    critic_loss     | 0.0132   |
|    ent_coef        | 0.0524   |
|    ent_coef_loss   | -0.226   |
|    learning_rate   | 0.0003   |
|    n_updates       | 14627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.82    |
| time/              |          |
|    episodes        | 488      |
|    fps             | 143      |
|    time_elapsed    | 815      |
|    total_timesteps | 117120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.64    |
| time/              |          |
|    episodes        | 492      |
|    fps             | 143      |
|    time_elapsed    | 827      |
|    total_timesteps | 119040   |
| train/             |          |
|    actor_loss      | 7.56     |
|    critic_loss     | 0.148    |
|    ent_coef        | 0.0515   |
|    ent_coef_loss   | -0.0242  |
|    learning_rate   | 0.0003   |
|    n_updates       | 14867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.64    |
| time/              |          |
|    episodes        | 496      |
|    fps             | 143      |
|    time_elapsed    | 827      |
|    total_timesteps | 119040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.45    |
| time/              |          |
|    episodes        | 500      |
|    fps             | 143      |
|    time_elapsed    | 841      |
|    total_timesteps | 120960   |
| train/             |          |
|    actor_loss      | 7.47     |
|    critic_loss     | 0.0158   |
|    ent_coef        | 0.0504   |
|    ent_coef_loss   | -0.282   |
|    learning_rate   | 0.0003   |
|    n_updates       | 15107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.45    |
| time/              |          |
|    episodes        | 504      |
|    fps             | 143      |
|    time_elapsed    | 841      |
|    total_timesteps | 120960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.2     |
| time/              |          |
|    episodes        | 508      |
|    fps             | 143      |
|    time_elapsed    | 854      |
|    total_timesteps | 122880   |
| train/             |          |
|    actor_loss      | 7.6      |
|    critic_loss     | 0.0597   |
|    ent_coef        | 0.0492   |
|    ent_coef_loss   | 0.14     |
|    learning_rate   | 0.0003   |
|    n_updates       | 15347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.2     |
| time/              |          |
|    episodes        | 512      |
|    fps             | 143      |
|    time_elapsed    | 854      |
|    total_timesteps | 122880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.06    |
| time/              |          |
|    episodes        | 516      |
|    fps             | 143      |
|    time_elapsed    | 867      |
|    total_timesteps | 124800   |
| train/             |          |
|    actor_loss      | 7.38     |
|    critic_loss     | 0.00402  |
|    ent_coef        | 0.0501   |
|    ent_coef_loss   | -0.134   |
|    learning_rate   | 0.0003   |
|    n_updates       | 15587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -7.06    |
| time/              |          |
|    episodes        | 520      |
|    fps             | 143      |
|    time_elapsed    | 867      |
|    total_timesteps | 124800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.91    |
| time/              |          |
|    episodes        | 524      |
|    fps             | 144      |
|    time_elapsed    | 879      |
|    total_timesteps | 126720   |
| train/             |          |
|    actor_loss      | 7.49     |
|    critic_loss     | 0.0059   |
|    ent_coef        | 0.0488   |
|    ent_coef_loss   | -0.118   |
|    learning_rate   | 0.0003   |
|    n_updates       | 15827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.91    |
| time/              |          |
|    episodes        | 528      |
|    fps             | 144      |
|    time_elapsed    | 879      |
|    total_timesteps | 126720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.75    |
| time/              |          |
|    episodes        | 532      |
|    fps             | 144      |
|    time_elapsed    | 892      |
|    total_timesteps | 128640   |
| train/             |          |
|    actor_loss      | 7.45     |
|    critic_loss     | 0.0508   |
|    ent_coef        | 0.0479   |
|    ent_coef_loss   | -0.327   |
|    learning_rate   | 0.0003   |
|    n_updates       | 16067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.75    |
| time/              |          |
|    episodes        | 536      |
|    fps             | 144      |
|    time_elapsed    | 892      |
|    total_timesteps | 128640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.66    |
| time/              |          |
|    episodes        | 540      |
|    fps             | 144      |
|    time_elapsed    | 906      |
|    total_timesteps | 130560   |
| train/             |          |
|    actor_loss      | 7.45     |
|    critic_loss     | 0.0124   |
|    ent_coef        | 0.0477   |
|    ent_coef_loss   | -0.0698  |
|    learning_rate   | 0.0003   |
|    n_updates       | 16307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.66    |
| time/              |          |
|    episodes        | 544      |
|    fps             | 144      |
|    time_elapsed    | 906      |
|    total_timesteps | 130560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.57    |
| time/              |          |
|    episodes        | 548      |
|    fps             | 143      |
|    time_elapsed    | 920      |
|    total_timesteps | 132480   |
| train/             |          |
|    actor_loss      | 7.45     |
|    critic_loss     | 0.0293   |
|    ent_coef        | 0.0469   |
|    ent_coef_loss   | 0.838    |
|    learning_rate   | 0.0003   |
|    n_updates       | 16547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.57    |
| time/              |          |
|    episodes        | 552      |
|    fps             | 143      |
|    time_elapsed    | 920      |
|    total_timesteps | 132480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.52    |
| time/              |          |
|    episodes        | 556      |
|    fps             | 143      |
|    time_elapsed    | 934      |
|    total_timesteps | 134400   |
| train/             |          |
|    actor_loss      | 7.5      |
|    critic_loss     | 0.00633  |
|    ent_coef        | 0.0472   |
|    ent_coef_loss   | -0.0963  |
|    learning_rate   | 0.0003   |
|    n_updates       | 16787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.52    |
| time/              |          |
|    episodes        | 560      |
|    fps             | 143      |
|    time_elapsed    | 934      |
|    total_timesteps | 134400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.43    |
| time/              |          |
|    episodes        | 564      |
|    fps             | 143      |
|    time_elapsed    | 947      |
|    total_timesteps | 136320   |
| train/             |          |
|    actor_loss      | 7.68     |
|    critic_loss     | 0.0236   |
|    ent_coef        | 0.0472   |
|    ent_coef_loss   | 0.00233  |
|    learning_rate   | 0.0003   |
|    n_updates       | 17027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.43    |
| time/              |          |
|    episodes        | 568      |
|    fps             | 143      |
|    time_elapsed    | 947      |
|    total_timesteps | 136320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.27    |
| time/              |          |
|    episodes        | 572      |
|    fps             | 143      |
|    time_elapsed    | 960      |
|    total_timesteps | 138240   |
| train/             |          |
|    actor_loss      | 7.68     |
|    critic_loss     | 0.0521   |
|    ent_coef        | 0.0467   |
|    ent_coef_loss   | -0.279   |
|    learning_rate   | 0.0003   |
|    n_updates       | 17267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.27    |
| time/              |          |
|    episodes        | 576      |
|    fps             | 143      |
|    time_elapsed    | 960      |
|    total_timesteps | 138240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.25    |
| time/              |          |
|    episodes        | 580      |
|    fps             | 143      |
|    time_elapsed    | 973      |
|    total_timesteps | 140160   |
| train/             |          |
|    actor_loss      | 7.85     |
|    critic_loss     | 0.016    |
|    ent_coef        | 0.0497   |
|    ent_coef_loss   | 1.41     |
|    learning_rate   | 0.0003   |
|    n_updates       | 17507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.25    |
| time/              |          |
|    episodes        | 584      |
|    fps             | 143      |
|    time_elapsed    | 973      |
|    total_timesteps | 140160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.24    |
| time/              |          |
|    episodes        | 588      |
|    fps             | 143      |
|    time_elapsed    | 987      |
|    total_timesteps | 142080   |
| train/             |          |
|    actor_loss      | 7.38     |
|    critic_loss     | 0.0135   |
|    ent_coef        | 0.0501   |
|    ent_coef_loss   | -0.193   |
|    learning_rate   | 0.0003   |
|    n_updates       | 17747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.24    |
| time/              |          |
|    episodes        | 592      |
|    fps             | 143      |
|    time_elapsed    | 987      |
|    total_timesteps | 142080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.22    |
| time/              |          |
|    episodes        | 596      |
|    fps             | 143      |
|    time_elapsed    | 1001     |
|    total_timesteps | 144000   |
| train/             |          |
|    actor_loss      | 7.53     |
|    critic_loss     | 0.00569  |
|    ent_coef        | 0.0503   |
|    ent_coef_loss   | -0.0861  |
|    learning_rate   | 0.0003   |
|    n_updates       | 17987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.22    |
| time/              |          |
|    episodes        | 600      |
|    fps             | 143      |
|    time_elapsed    | 1001     |
|    total_timesteps | 144000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.17    |
| time/              |          |
|    episodes        | 604      |
|    fps             | 143      |
|    time_elapsed    | 1015     |
|    total_timesteps | 145920   |
| train/             |          |
|    actor_loss      | 7.34     |
|    critic_loss     | 0.00152  |
|    ent_coef        | 0.0497   |
|    ent_coef_loss   | 0.172    |
|    learning_rate   | 0.0003   |
|    n_updates       | 18227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.17    |
| time/              |          |
|    episodes        | 608      |
|    fps             | 143      |
|    time_elapsed    | 1015     |
|    total_timesteps | 145920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.12    |
| time/              |          |
|    episodes        | 612      |
|    fps             | 143      |
|    time_elapsed    | 1028     |
|    total_timesteps | 147840   |
| train/             |          |
|    actor_loss      | 7.39     |
|    critic_loss     | 0.0157   |
|    ent_coef        | 0.0498   |
|    ent_coef_loss   | 0.0855   |
|    learning_rate   | 0.0003   |
|    n_updates       | 18467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.12    |
| time/              |          |
|    episodes        | 616      |
|    fps             | 143      |
|    time_elapsed    | 1028     |
|    total_timesteps | 147840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.1     |
| time/              |          |
|    episodes        | 620      |
|    fps             | 143      |
|    time_elapsed    | 1040     |
|    total_timesteps | 149760   |
| train/             |          |
|    actor_loss      | 7.61     |
|    critic_loss     | 0.0194   |
|    ent_coef        | 0.0501   |
|    ent_coef_loss   | 0.143    |
|    learning_rate   | 0.0003   |
|    n_updates       | 18707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.1     |
| time/              |          |
|    episodes        | 624      |
|    fps             | 143      |
|    time_elapsed    | 1040     |
|    total_timesteps | 149760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.08    |
| time/              |          |
|    episodes        | 628      |
|    fps             | 143      |
|    time_elapsed    | 1054     |
|    total_timesteps | 151680   |
| train/             |          |
|    actor_loss      | 7.36     |
|    critic_loss     | 0.00183  |
|    ent_coef        | 0.0494   |
|    ent_coef_loss   | -0.182   |
|    learning_rate   | 0.0003   |
|    n_updates       | 18947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.08    |
| time/              |          |
|    episodes        | 632      |
|    fps             | 143      |
|    time_elapsed    | 1054     |
|    total_timesteps | 151680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.05    |
| time/              |          |
|    episodes        | 636      |
|    fps             | 143      |
|    time_elapsed    | 1068     |
|    total_timesteps | 153600   |
| train/             |          |
|    actor_loss      | 7.47     |
|    critic_loss     | 0.015    |
|    ent_coef        | 0.0504   |
|    ent_coef_loss   | 0.339    |
|    learning_rate   | 0.0003   |
|    n_updates       | 19187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.05    |
| time/              |          |
|    episodes        | 640      |
|    fps             | 143      |
|    time_elapsed    | 1068     |
|    total_timesteps | 153600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.05    |
| time/              |          |
|    episodes        | 644      |
|    fps             | 143      |
|    time_elapsed    | 1082     |
|    total_timesteps | 155520   |
| train/             |          |
|    actor_loss      | 7.66     |
|    critic_loss     | 0.0346   |
|    ent_coef        | 0.0495   |
|    ent_coef_loss   | 0.308    |
|    learning_rate   | 0.0003   |
|    n_updates       | 19427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.05    |
| time/              |          |
|    episodes        | 648      |
|    fps             | 143      |
|    time_elapsed    | 1082     |
|    total_timesteps | 155520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.05    |
| time/              |          |
|    episodes        | 652      |
|    fps             | 143      |
|    time_elapsed    | 1096     |
|    total_timesteps | 157440   |
| train/             |          |
|    actor_loss      | 7.51     |
|    critic_loss     | 0.00487  |
|    ent_coef        | 0.0498   |
|    ent_coef_loss   | -0.124   |
|    learning_rate   | 0.0003   |
|    n_updates       | 19667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.05    |
| time/              |          |
|    episodes        | 656      |
|    fps             | 143      |
|    time_elapsed    | 1096     |
|    total_timesteps | 157440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.05    |
| time/              |          |
|    episodes        | 660      |
|    fps             | 143      |
|    time_elapsed    | 1110     |
|    total_timesteps | 159360   |
| train/             |          |
|    actor_loss      | 7.51     |
|    critic_loss     | 0.012    |
|    ent_coef        | 0.0489   |
|    ent_coef_loss   | -0.09    |
|    learning_rate   | 0.0003   |
|    n_updates       | 19907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.05    |
| time/              |          |
|    episodes        | 664      |
|    fps             | 143      |
|    time_elapsed    | 1110     |
|    total_timesteps | 159360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.17    |
| time/              |          |
|    episodes        | 668      |
|    fps             | 143      |
|    time_elapsed    | 1124     |
|    total_timesteps | 161280   |
| train/             |          |
|    actor_loss      | 7.45     |
|    critic_loss     | 0.0161   |
|    ent_coef        | 0.0517   |
|    ent_coef_loss   | -0.451   |
|    learning_rate   | 0.0003   |
|    n_updates       | 20147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.17    |
| time/              |          |
|    episodes        | 672      |
|    fps             | 143      |
|    time_elapsed    | 1124     |
|    total_timesteps | 161280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.18    |
| time/              |          |
|    episodes        | 676      |
|    fps             | 143      |
|    time_elapsed    | 1138     |
|    total_timesteps | 163200   |
| train/             |          |
|    actor_loss      | 7.7      |
|    critic_loss     | 0.0125   |
|    ent_coef        | 0.0515   |
|    ent_coef_loss   | 0.601    |
|    learning_rate   | 0.0003   |
|    n_updates       | 20387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.18    |
| time/              |          |
|    episodes        | 680      |
|    fps             | 143      |
|    time_elapsed    | 1138     |
|    total_timesteps | 163200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.18    |
| time/              |          |
|    episodes        | 684      |
|    fps             | 143      |
|    time_elapsed    | 1152     |
|    total_timesteps | 165120   |
| train/             |          |
|    actor_loss      | 7.44     |
|    critic_loss     | 0.00353  |
|    ent_coef        | 0.0509   |
|    ent_coef_loss   | 0.0839   |
|    learning_rate   | 0.0003   |
|    n_updates       | 20627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.18    |
| time/              |          |
|    episodes        | 688      |
|    fps             | 143      |
|    time_elapsed    | 1152     |
|    total_timesteps | 165120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.16    |
| time/              |          |
|    episodes        | 692      |
|    fps             | 143      |
|    time_elapsed    | 1166     |
|    total_timesteps | 167040   |
| train/             |          |
|    actor_loss      | 7.43     |
|    critic_loss     | 0.0102   |
|    ent_coef        | 0.0501   |
|    ent_coef_loss   | 0.228    |
|    learning_rate   | 0.0003   |
|    n_updates       | 20867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.16    |
| time/              |          |
|    episodes        | 696      |
|    fps             | 143      |
|    time_elapsed    | 1166     |
|    total_timesteps | 167040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.17    |
| time/              |          |
|    episodes        | 700      |
|    fps             | 143      |
|    time_elapsed    | 1180     |
|    total_timesteps | 168960   |
| train/             |          |
|    actor_loss      | 7.39     |
|    critic_loss     | 0.00579  |
|    ent_coef        | 0.0498   |
|    ent_coef_loss   | -0.215   |
|    learning_rate   | 0.0003   |
|    n_updates       | 21107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.17    |
| time/              |          |
|    episodes        | 704      |
|    fps             | 143      |
|    time_elapsed    | 1180     |
|    total_timesteps | 168960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.22    |
| time/              |          |
|    episodes        | 708      |
|    fps             | 143      |
|    time_elapsed    | 1194     |
|    total_timesteps | 170880   |
| train/             |          |
|    actor_loss      | 7.38     |
|    critic_loss     | 0.00426  |
|    ent_coef        | 0.0496   |
|    ent_coef_loss   | -0.322   |
|    learning_rate   | 0.0003   |
|    n_updates       | 21347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.22    |
| time/              |          |
|    episodes        | 712      |
|    fps             | 143      |
|    time_elapsed    | 1194     |
|    total_timesteps | 170880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.23    |
| time/              |          |
|    episodes        | 716      |
|    fps             | 143      |
|    time_elapsed    | 1208     |
|    total_timesteps | 172800   |
| train/             |          |
|    actor_loss      | 7.48     |
|    critic_loss     | 0.00267  |
|    ent_coef        | 0.0488   |
|    ent_coef_loss   | 0.0184   |
|    learning_rate   | 0.0003   |
|    n_updates       | 21587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.23    |
| time/              |          |
|    episodes        | 720      |
|    fps             | 143      |
|    time_elapsed    | 1208     |
|    total_timesteps | 172800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.23    |
| time/              |          |
|    episodes        | 724      |
|    fps             | 142      |
|    time_elapsed    | 1222     |
|    total_timesteps | 174720   |
| train/             |          |
|    actor_loss      | 7.44     |
|    critic_loss     | 0.00674  |
|    ent_coef        | 0.0478   |
|    ent_coef_loss   | -0.0743  |
|    learning_rate   | 0.0003   |
|    n_updates       | 21827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.23    |
| time/              |          |
|    episodes        | 728      |
|    fps             | 142      |
|    time_elapsed    | 1222     |
|    total_timesteps | 174720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.18    |
| time/              |          |
|    episodes        | 732      |
|    fps             | 142      |
|    time_elapsed    | 1235     |
|    total_timesteps | 176640   |
| train/             |          |
|    actor_loss      | 7.46     |
|    critic_loss     | 0.0278   |
|    ent_coef        | 0.047    |
|    ent_coef_loss   | -0.105   |
|    learning_rate   | 0.0003   |
|    n_updates       | 22067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.18    |
| time/              |          |
|    episodes        | 736      |
|    fps             | 142      |
|    time_elapsed    | 1235     |
|    total_timesteps | 176640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.2     |
| time/              |          |
|    episodes        | 740      |
|    fps             | 142      |
|    time_elapsed    | 1249     |
|    total_timesteps | 178560   |
| train/             |          |
|    actor_loss      | 7.58     |
|    critic_loss     | 0.0165   |
|    ent_coef        | 0.0494   |
|    ent_coef_loss   | 0.0173   |
|    learning_rate   | 0.0003   |
|    n_updates       | 22307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.2     |
| time/              |          |
|    episodes        | 744      |
|    fps             | 142      |
|    time_elapsed    | 1249     |
|    total_timesteps | 178560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.24    |
| time/              |          |
|    episodes        | 748      |
|    fps             | 142      |
|    time_elapsed    | 1263     |
|    total_timesteps | 180480   |
| train/             |          |
|    actor_loss      | 7.58     |
|    critic_loss     | 0.0176   |
|    ent_coef        | 0.0508   |
|    ent_coef_loss   | -0.13    |
|    learning_rate   | 0.0003   |
|    n_updates       | 22547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.24    |
| time/              |          |
|    episodes        | 752      |
|    fps             | 142      |
|    time_elapsed    | 1263     |
|    total_timesteps | 180480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.27    |
| time/              |          |
|    episodes        | 756      |
|    fps             | 142      |
|    time_elapsed    | 1277     |
|    total_timesteps | 182400   |
| train/             |          |
|    actor_loss      | 7.54     |
|    critic_loss     | 0.00306  |
|    ent_coef        | 0.0498   |
|    ent_coef_loss   | -0.565   |
|    learning_rate   | 0.0003   |
|    n_updates       | 22787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.27    |
| time/              |          |
|    episodes        | 760      |
|    fps             | 142      |
|    time_elapsed    | 1277     |
|    total_timesteps | 182400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.31    |
| time/              |          |
|    episodes        | 764      |
|    fps             | 142      |
|    time_elapsed    | 1291     |
|    total_timesteps | 184320   |
| train/             |          |
|    actor_loss      | 7.36     |
|    critic_loss     | 0.00267  |
|    ent_coef        | 0.0498   |
|    ent_coef_loss   | -0.253   |
|    learning_rate   | 0.0003   |
|    n_updates       | 23027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.31    |
| time/              |          |
|    episodes        | 768      |
|    fps             | 142      |
|    time_elapsed    | 1291     |
|    total_timesteps | 184320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.31    |
| time/              |          |
|    episodes        | 772      |
|    fps             | 142      |
|    time_elapsed    | 1305     |
|    total_timesteps | 186240   |
| train/             |          |
|    actor_loss      | 7.42     |
|    critic_loss     | 0.0029   |
|    ent_coef        | 0.0489   |
|    ent_coef_loss   | -0.0549  |
|    learning_rate   | 0.0003   |
|    n_updates       | 23267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.31    |
| time/              |          |
|    episodes        | 776      |
|    fps             | 142      |
|    time_elapsed    | 1305     |
|    total_timesteps | 186240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.28    |
| time/              |          |
|    episodes        | 780      |
|    fps             | 142      |
|    time_elapsed    | 1318     |
|    total_timesteps | 188160   |
| train/             |          |
|    actor_loss      | 7.58     |
|    critic_loss     | 0.0527   |
|    ent_coef        | 0.0486   |
|    ent_coef_loss   | -0.449   |
|    learning_rate   | 0.0003   |
|    n_updates       | 23507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.28    |
| time/              |          |
|    episodes        | 784      |
|    fps             | 142      |
|    time_elapsed    | 1318     |
|    total_timesteps | 188160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.25    |
| time/              |          |
|    episodes        | 788      |
|    fps             | 142      |
|    time_elapsed    | 1332     |
|    total_timesteps | 190080   |
| train/             |          |
|    actor_loss      | 7.58     |
|    critic_loss     | 0.0053   |
|    ent_coef        | 0.0479   |
|    ent_coef_loss   | -0.279   |
|    learning_rate   | 0.0003   |
|    n_updates       | 23747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.25    |
| time/              |          |
|    episodes        | 792      |
|    fps             | 142      |
|    time_elapsed    | 1332     |
|    total_timesteps | 190080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.25    |
| time/              |          |
|    episodes        | 796      |
|    fps             | 142      |
|    time_elapsed    | 1345     |
|    total_timesteps | 192000   |
| train/             |          |
|    actor_loss      | 7.47     |
|    critic_loss     | 0.00191  |
|    ent_coef        | 0.0499   |
|    ent_coef_loss   | 0.423    |
|    learning_rate   | 0.0003   |
|    n_updates       | 23987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.25    |
| time/              |          |
|    episodes        | 800      |
|    fps             | 142      |
|    time_elapsed    | 1345     |
|    total_timesteps | 192000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.26    |
| time/              |          |
|    episodes        | 804      |
|    fps             | 142      |
|    time_elapsed    | 1358     |
|    total_timesteps | 193920   |
| train/             |          |
|    actor_loss      | 7.42     |
|    critic_loss     | 0.00416  |
|    ent_coef        | 0.0493   |
|    ent_coef_loss   | -0.283   |
|    learning_rate   | 0.0003   |
|    n_updates       | 24227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.26    |
| time/              |          |
|    episodes        | 808      |
|    fps             | 142      |
|    time_elapsed    | 1358     |
|    total_timesteps | 193920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.3     |
| time/              |          |
|    episodes        | 812      |
|    fps             | 142      |
|    time_elapsed    | 1372     |
|    total_timesteps | 195840   |
| train/             |          |
|    actor_loss      | 7.58     |
|    critic_loss     | 0.0139   |
|    ent_coef        | 0.0481   |
|    ent_coef_loss   | -0.0726  |
|    learning_rate   | 0.0003   |
|    n_updates       | 24467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.3     |
| time/              |          |
|    episodes        | 816      |
|    fps             | 142      |
|    time_elapsed    | 1372     |
|    total_timesteps | 195840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.29    |
| time/              |          |
|    episodes        | 820      |
|    fps             | 142      |
|    time_elapsed    | 1385     |
|    total_timesteps | 197760   |
| train/             |          |
|    actor_loss      | 7.49     |
|    critic_loss     | 0.00338  |
|    ent_coef        | 0.0494   |
|    ent_coef_loss   | -0.1     |
|    learning_rate   | 0.0003   |
|    n_updates       | 24707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.29    |
| time/              |          |
|    episodes        | 824      |
|    fps             | 142      |
|    time_elapsed    | 1385     |
|    total_timesteps | 197760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.3     |
| time/              |          |
|    episodes        | 828      |
|    fps             | 142      |
|    time_elapsed    | 1399     |
|    total_timesteps | 199680   |
| train/             |          |
|    actor_loss      | 7.4      |
|    critic_loss     | 0.00417  |
|    ent_coef        | 0.0484   |
|    ent_coef_loss   | -0.0342  |
|    learning_rate   | 0.0003   |
|    n_updates       | 24947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.3     |
| time/              |          |
|    episodes        | 832      |
|    fps             | 142      |
|    time_elapsed    | 1399     |
|    total_timesteps | 199680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.3     |
| time/              |          |
|    episodes        | 836      |
|    fps             | 142      |
|    time_elapsed    | 1413     |
|    total_timesteps | 201600   |
| train/             |          |
|    actor_loss      | 7.43     |
|    critic_loss     | 0.0014   |
|    ent_coef        | 0.0473   |
|    ent_coef_loss   | -0.293   |
|    learning_rate   | 0.0003   |
|    n_updates       | 25187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.3     |
| time/              |          |
|    episodes        | 840      |
|    fps             | 142      |
|    time_elapsed    | 1413     |
|    total_timesteps | 201600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.28    |
| time/              |          |
|    episodes        | 844      |
|    fps             | 142      |
|    time_elapsed    | 1427     |
|    total_timesteps | 203520   |
| train/             |          |
|    actor_loss      | 7.39     |
|    critic_loss     | 0.000722 |
|    ent_coef        | 0.0462   |
|    ent_coef_loss   | -0.145   |
|    learning_rate   | 0.0003   |
|    n_updates       | 25427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.28    |
| time/              |          |
|    episodes        | 848      |
|    fps             | 142      |
|    time_elapsed    | 1427     |
|    total_timesteps | 203520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.25    |
| time/              |          |
|    episodes        | 852      |
|    fps             | 142      |
|    time_elapsed    | 1441     |
|    total_timesteps | 205440   |
| train/             |          |
|    actor_loss      | 7.38     |
|    critic_loss     | 0.000572 |
|    ent_coef        | 0.0454   |
|    ent_coef_loss   | -0.0949  |
|    learning_rate   | 0.0003   |
|    n_updates       | 25667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.25    |
| time/              |          |
|    episodes        | 856      |
|    fps             | 142      |
|    time_elapsed    | 1441     |
|    total_timesteps | 205440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.15    |
| time/              |          |
|    episodes        | 860      |
|    fps             | 142      |
|    time_elapsed    | 1455     |
|    total_timesteps | 207360   |
| train/             |          |
|    actor_loss      | 7.37     |
|    critic_loss     | 0.000305 |
|    ent_coef        | 0.0448   |
|    ent_coef_loss   | 0.109    |
|    learning_rate   | 0.0003   |
|    n_updates       | 25907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.15    |
| time/              |          |
|    episodes        | 864      |
|    fps             | 142      |
|    time_elapsed    | 1455     |
|    total_timesteps | 207360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.08    |
| time/              |          |
|    episodes        | 868      |
|    fps             | 142      |
|    time_elapsed    | 1470     |
|    total_timesteps | 209280   |
| train/             |          |
|    actor_loss      | 7.36     |
|    critic_loss     | 0.000282 |
|    ent_coef        | 0.0443   |
|    ent_coef_loss   | 0.00955  |
|    learning_rate   | 0.0003   |
|    n_updates       | 26147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -6.08    |
| time/              |          |
|    episodes        | 872      |
|    fps             | 142      |
|    time_elapsed    | 1470     |
|    total_timesteps | 209280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.98    |
| time/              |          |
|    episodes        | 876      |
|    fps             | 142      |
|    time_elapsed    | 1484     |
|    total_timesteps | 211200   |
| train/             |          |
|    actor_loss      | 7.35     |
|    critic_loss     | 0.000268 |
|    ent_coef        | 0.0438   |
|    ent_coef_loss   | -0.114   |
|    learning_rate   | 0.0003   |
|    n_updates       | 26387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.98    |
| time/              |          |
|    episodes        | 880      |
|    fps             | 142      |
|    time_elapsed    | 1484     |
|    total_timesteps | 211200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.91    |
| time/              |          |
|    episodes        | 884      |
|    fps             | 142      |
|    time_elapsed    | 1497     |
|    total_timesteps | 213120   |
| train/             |          |
|    actor_loss      | 7.35     |
|    critic_loss     | 0.000326 |
|    ent_coef        | 0.0435   |
|    ent_coef_loss   | -0.033   |
|    learning_rate   | 0.0003   |
|    n_updates       | 26627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.91    |
| time/              |          |
|    episodes        | 888      |
|    fps             | 142      |
|    time_elapsed    | 1497     |
|    total_timesteps | 213120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.85    |
| time/              |          |
|    episodes        | 892      |
|    fps             | 142      |
|    time_elapsed    | 1511     |
|    total_timesteps | 215040   |
| train/             |          |
|    actor_loss      | 7.35     |
|    critic_loss     | 0.000296 |
|    ent_coef        | 0.0433   |
|    ent_coef_loss   | 0.0545   |
|    learning_rate   | 0.0003   |
|    n_updates       | 26867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.85    |
| time/              |          |
|    episodes        | 896      |
|    fps             | 142      |
|    time_elapsed    | 1511     |
|    total_timesteps | 215040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.78    |
| time/              |          |
|    episodes        | 900      |
|    fps             | 142      |
|    time_elapsed    | 1525     |
|    total_timesteps | 216960   |
| train/             |          |
|    actor_loss      | 7.35     |
|    critic_loss     | 0.000291 |
|    ent_coef        | 0.043    |
|    ent_coef_loss   | 0.000298 |
|    learning_rate   | 0.0003   |
|    n_updates       | 27107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.78    |
| time/              |          |
|    episodes        | 904      |
|    fps             | 142      |
|    time_elapsed    | 1525     |
|    total_timesteps | 216960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.71    |
| time/              |          |
|    episodes        | 908      |
|    fps             | 142      |
|    time_elapsed    | 1539     |
|    total_timesteps | 218880   |
| train/             |          |
|    actor_loss      | 7.33     |
|    critic_loss     | 0.00028  |
|    ent_coef        | 0.0429   |
|    ent_coef_loss   | -0.0205  |
|    learning_rate   | 0.0003   |
|    n_updates       | 27347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.71    |
| time/              |          |
|    episodes        | 912      |
|    fps             | 142      |
|    time_elapsed    | 1539     |
|    total_timesteps | 218880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.65    |
| time/              |          |
|    episodes        | 916      |
|    fps             | 142      |
|    time_elapsed    | 1553     |
|    total_timesteps | 220800   |
| train/             |          |
|    actor_loss      | 7.33     |
|    critic_loss     | 0.000333 |
|    ent_coef        | 0.0428   |
|    ent_coef_loss   | 0.368    |
|    learning_rate   | 0.0003   |
|    n_updates       | 27587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.65    |
| time/              |          |
|    episodes        | 920      |
|    fps             | 142      |
|    time_elapsed    | 1553     |
|    total_timesteps | 220800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.62    |
| time/              |          |
|    episodes        | 924      |
|    fps             | 142      |
|    time_elapsed    | 1566     |
|    total_timesteps | 222720   |
| train/             |          |
|    actor_loss      | 7.32     |
|    critic_loss     | 0.000254 |
|    ent_coef        | 0.0427   |
|    ent_coef_loss   | 0.141    |
|    learning_rate   | 0.0003   |
|    n_updates       | 27827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.62    |
| time/              |          |
|    episodes        | 928      |
|    fps             | 142      |
|    time_elapsed    | 1566     |
|    total_timesteps | 222720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.59    |
| time/              |          |
|    episodes        | 932      |
|    fps             | 142      |
|    time_elapsed    | 1580     |
|    total_timesteps | 224640   |
| train/             |          |
|    actor_loss      | 7.31     |
|    critic_loss     | 0.000329 |
|    ent_coef        | 0.0426   |
|    ent_coef_loss   | -0.117   |
|    learning_rate   | 0.0003   |
|    n_updates       | 28067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.59    |
| time/              |          |
|    episodes        | 936      |
|    fps             | 142      |
|    time_elapsed    | 1580     |
|    total_timesteps | 224640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.54    |
| time/              |          |
|    episodes        | 940      |
|    fps             | 142      |
|    time_elapsed    | 1594     |
|    total_timesteps | 226560   |
| train/             |          |
|    actor_loss      | 7.31     |
|    critic_loss     | 0.000337 |
|    ent_coef        | 0.0426   |
|    ent_coef_loss   | 0.0404   |
|    learning_rate   | 0.0003   |
|    n_updates       | 28307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.54    |
| time/              |          |
|    episodes        | 944      |
|    fps             | 142      |
|    time_elapsed    | 1594     |
|    total_timesteps | 226560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 948      |
|    fps             | 142      |
|    time_elapsed    | 1608     |
|    total_timesteps | 228480   |
| train/             |          |
|    actor_loss      | 7.31     |
|    critic_loss     | 0.000299 |
|    ent_coef        | 0.0426   |
|    ent_coef_loss   | 0.135    |
|    learning_rate   | 0.0003   |
|    n_updates       | 28547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 952      |
|    fps             | 142      |
|    time_elapsed    | 1608     |
|    total_timesteps | 228480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.42    |
| time/              |          |
|    episodes        | 956      |
|    fps             | 142      |
|    time_elapsed    | 1621     |
|    total_timesteps | 230400   |
| train/             |          |
|    actor_loss      | 7.29     |
|    critic_loss     | 0.000316 |
|    ent_coef        | 0.0425   |
|    ent_coef_loss   | -0.018   |
|    learning_rate   | 0.0003   |
|    n_updates       | 28787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.42    |
| time/              |          |
|    episodes        | 960      |
|    fps             | 142      |
|    time_elapsed    | 1621     |
|    total_timesteps | 230400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.39    |
| time/              |          |
|    episodes        | 964      |
|    fps             | 142      |
|    time_elapsed    | 1635     |
|    total_timesteps | 232320   |
| train/             |          |
|    actor_loss      | 7.3      |
|    critic_loss     | 0.000327 |
|    ent_coef        | 0.0425   |
|    ent_coef_loss   | 0.0315   |
|    learning_rate   | 0.0003   |
|    n_updates       | 29027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.39    |
| time/              |          |
|    episodes        | 968      |
|    fps             | 142      |
|    time_elapsed    | 1635     |
|    total_timesteps | 232320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.4     |
| time/              |          |
|    episodes        | 972      |
|    fps             | 141      |
|    time_elapsed    | 1650     |
|    total_timesteps | 234240   |
| train/             |          |
|    actor_loss      | 7.29     |
|    critic_loss     | 0.000271 |
|    ent_coef        | 0.0425   |
|    ent_coef_loss   | 0.109    |
|    learning_rate   | 0.0003   |
|    n_updates       | 29267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.4     |
| time/              |          |
|    episodes        | 976      |
|    fps             | 141      |
|    time_elapsed    | 1650     |
|    total_timesteps | 234240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.43    |
| time/              |          |
|    episodes        | 980      |
|    fps             | 141      |
|    time_elapsed    | 1664     |
|    total_timesteps | 236160   |
| train/             |          |
|    actor_loss      | 7.28     |
|    critic_loss     | 0.000301 |
|    ent_coef        | 0.0425   |
|    ent_coef_loss   | 0.119    |
|    learning_rate   | 0.0003   |
|    n_updates       | 29507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.43    |
| time/              |          |
|    episodes        | 984      |
|    fps             | 141      |
|    time_elapsed    | 1664     |
|    total_timesteps | 236160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.41    |
| time/              |          |
|    episodes        | 988      |
|    fps             | 141      |
|    time_elapsed    | 1679     |
|    total_timesteps | 238080   |
| train/             |          |
|    actor_loss      | 7.28     |
|    critic_loss     | 0.000297 |
|    ent_coef        | 0.0423   |
|    ent_coef_loss   | -0.188   |
|    learning_rate   | 0.0003   |
|    n_updates       | 29747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.41    |
| time/              |          |
|    episodes        | 992      |
|    fps             | 141      |
|    time_elapsed    | 1679     |
|    total_timesteps | 238080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.43    |
| time/              |          |
|    episodes        | 996      |
|    fps             | 141      |
|    time_elapsed    | 1693     |
|    total_timesteps | 240000   |
| train/             |          |
|    actor_loss      | 7.27     |
|    critic_loss     | 0.000293 |
|    ent_coef        | 0.0423   |
|    ent_coef_loss   | 0.0253   |
|    learning_rate   | 0.0003   |
|    n_updates       | 29987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.43    |
| time/              |          |
|    episodes        | 1000     |
|    fps             | 141      |
|    time_elapsed    | 1693     |
|    total_timesteps | 240000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.44    |
| time/              |          |
|    episodes        | 1004     |
|    fps             | 141      |
|    time_elapsed    | 1707     |
|    total_timesteps | 241920   |
| train/             |          |
|    actor_loss      | 7.26     |
|    critic_loss     | 0.000309 |
|    ent_coef        | 0.0423   |
|    ent_coef_loss   | 0.276    |
|    learning_rate   | 0.0003   |
|    n_updates       | 30227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.44    |
| time/              |          |
|    episodes        | 1008     |
|    fps             | 141      |
|    time_elapsed    | 1707     |
|    total_timesteps | 241920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1012     |
|    fps             | 141      |
|    time_elapsed    | 1720     |
|    total_timesteps | 243840   |
| train/             |          |
|    actor_loss      | 7.26     |
|    critic_loss     | 0.000458 |
|    ent_coef        | 0.0423   |
|    ent_coef_loss   | 0.0424   |
|    learning_rate   | 0.0003   |
|    n_updates       | 30467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1016     |
|    fps             | 141      |
|    time_elapsed    | 1720     |
|    total_timesteps | 243840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1020     |
|    fps             | 141      |
|    time_elapsed    | 1734     |
|    total_timesteps | 245760   |
| train/             |          |
|    actor_loss      | 7.24     |
|    critic_loss     | 0.000283 |
|    ent_coef        | 0.0424   |
|    ent_coef_loss   | -0.207   |
|    learning_rate   | 0.0003   |
|    n_updates       | 30707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1024     |
|    fps             | 141      |
|    time_elapsed    | 1734     |
|    total_timesteps | 245760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.44    |
| time/              |          |
|    episodes        | 1028     |
|    fps             | 141      |
|    time_elapsed    | 1748     |
|    total_timesteps | 247680   |
| train/             |          |
|    actor_loss      | 7.23     |
|    critic_loss     | 0.000302 |
|    ent_coef        | 0.0423   |
|    ent_coef_loss   | -0.186   |
|    learning_rate   | 0.0003   |
|    n_updates       | 30947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.44    |
| time/              |          |
|    episodes        | 1032     |
|    fps             | 141      |
|    time_elapsed    | 1748     |
|    total_timesteps | 247680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1036     |
|    fps             | 141      |
|    time_elapsed    | 1761     |
|    total_timesteps | 249600   |
| train/             |          |
|    actor_loss      | 7.23     |
|    critic_loss     | 0.000235 |
|    ent_coef        | 0.0422   |
|    ent_coef_loss   | -0.0115  |
|    learning_rate   | 0.0003   |
|    n_updates       | 31187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1040     |
|    fps             | 141      |
|    time_elapsed    | 1761     |
|    total_timesteps | 249600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1044     |
|    fps             | 141      |
|    time_elapsed    | 1776     |
|    total_timesteps | 251520   |
| train/             |          |
|    actor_loss      | 7.2      |
|    critic_loss     | 0.000292 |
|    ent_coef        | 0.0422   |
|    ent_coef_loss   | -0.165   |
|    learning_rate   | 0.0003   |
|    n_updates       | 31427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1048     |
|    fps             | 141      |
|    time_elapsed    | 1776     |
|    total_timesteps | 251520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1052     |
|    fps             | 141      |
|    time_elapsed    | 1790     |
|    total_timesteps | 253440   |
| train/             |          |
|    actor_loss      | 7.22     |
|    critic_loss     | 0.000259 |
|    ent_coef        | 0.042    |
|    ent_coef_loss   | 0.432    |
|    learning_rate   | 0.0003   |
|    n_updates       | 31667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1056     |
|    fps             | 141      |
|    time_elapsed    | 1790     |
|    total_timesteps | 253440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1060     |
|    fps             | 141      |
|    time_elapsed    | 1804     |
|    total_timesteps | 255360   |
| train/             |          |
|    actor_loss      | 7.21     |
|    critic_loss     | 0.000319 |
|    ent_coef        | 0.0422   |
|    ent_coef_loss   | 0.112    |
|    learning_rate   | 0.0003   |
|    n_updates       | 31907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1064     |
|    fps             | 141      |
|    time_elapsed    | 1804     |
|    total_timesteps | 255360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1068     |
|    fps             | 141      |
|    time_elapsed    | 1818     |
|    total_timesteps | 257280   |
| train/             |          |
|    actor_loss      | 7.21     |
|    critic_loss     | 0.000317 |
|    ent_coef        | 0.0422   |
|    ent_coef_loss   | -0.0528  |
|    learning_rate   | 0.0003   |
|    n_updates       | 32147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1072     |
|    fps             | 141      |
|    time_elapsed    | 1818     |
|    total_timesteps | 257280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1076     |
|    fps             | 141      |
|    time_elapsed    | 1832     |
|    total_timesteps | 259200   |
| train/             |          |
|    actor_loss      | 7.2      |
|    critic_loss     | 0.000367 |
|    ent_coef        | 0.0421   |
|    ent_coef_loss   | -0.0196  |
|    learning_rate   | 0.0003   |
|    n_updates       | 32387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1080     |
|    fps             | 141      |
|    time_elapsed    | 1832     |
|    total_timesteps | 259200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1084     |
|    fps             | 141      |
|    time_elapsed    | 1846     |
|    total_timesteps | 261120   |
| train/             |          |
|    actor_loss      | 7.19     |
|    critic_loss     | 0.000291 |
|    ent_coef        | 0.042    |
|    ent_coef_loss   | 0.0255   |
|    learning_rate   | 0.0003   |
|    n_updates       | 32627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1088     |
|    fps             | 141      |
|    time_elapsed    | 1846     |
|    total_timesteps | 261120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.44    |
| time/              |          |
|    episodes        | 1092     |
|    fps             | 141      |
|    time_elapsed    | 1860     |
|    total_timesteps | 263040   |
| train/             |          |
|    actor_loss      | 7.17     |
|    critic_loss     | 0.000566 |
|    ent_coef        | 0.0421   |
|    ent_coef_loss   | -0.0782  |
|    learning_rate   | 0.0003   |
|    n_updates       | 32867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.44    |
| time/              |          |
|    episodes        | 1096     |
|    fps             | 141      |
|    time_elapsed    | 1860     |
|    total_timesteps | 263040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.42    |
| time/              |          |
|    episodes        | 1100     |
|    fps             | 141      |
|    time_elapsed    | 1874     |
|    total_timesteps | 264960   |
| train/             |          |
|    actor_loss      | 7.17     |
|    critic_loss     | 0.000302 |
|    ent_coef        | 0.042    |
|    ent_coef_loss   | -0.136   |
|    learning_rate   | 0.0003   |
|    n_updates       | 33107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.42    |
| time/              |          |
|    episodes        | 1104     |
|    fps             | 141      |
|    time_elapsed    | 1874     |
|    total_timesteps | 264960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.42    |
| time/              |          |
|    episodes        | 1108     |
|    fps             | 141      |
|    time_elapsed    | 1889     |
|    total_timesteps | 266880   |
| train/             |          |
|    actor_loss      | 7.16     |
|    critic_loss     | 0.000271 |
|    ent_coef        | 0.042    |
|    ent_coef_loss   | 0.015    |
|    learning_rate   | 0.0003   |
|    n_updates       | 33347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.42    |
| time/              |          |
|    episodes        | 1112     |
|    fps             | 141      |
|    time_elapsed    | 1889     |
|    total_timesteps | 266880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.4     |
| time/              |          |
|    episodes        | 1116     |
|    fps             | 141      |
|    time_elapsed    | 1903     |
|    total_timesteps | 268800   |
| train/             |          |
|    actor_loss      | 7.16     |
|    critic_loss     | 0.000378 |
|    ent_coef        | 0.042    |
|    ent_coef_loss   | -0.425   |
|    learning_rate   | 0.0003   |
|    n_updates       | 33587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.4     |
| time/              |          |
|    episodes        | 1120     |
|    fps             | 141      |
|    time_elapsed    | 1903     |
|    total_timesteps | 268800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.37    |
| time/              |          |
|    episodes        | 1124     |
|    fps             | 141      |
|    time_elapsed    | 1916     |
|    total_timesteps | 270720   |
| train/             |          |
|    actor_loss      | 7.17     |
|    critic_loss     | 0.000309 |
|    ent_coef        | 0.0419   |
|    ent_coef_loss   | -0.0434  |
|    learning_rate   | 0.0003   |
|    n_updates       | 33827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.37    |
| time/              |          |
|    episodes        | 1128     |
|    fps             | 141      |
|    time_elapsed    | 1916     |
|    total_timesteps | 270720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.39    |
| time/              |          |
|    episodes        | 1132     |
|    fps             | 141      |
|    time_elapsed    | 1930     |
|    total_timesteps | 272640   |
| train/             |          |
|    actor_loss      | 7.15     |
|    critic_loss     | 0.000247 |
|    ent_coef        | 0.0419   |
|    ent_coef_loss   | -0.124   |
|    learning_rate   | 0.0003   |
|    n_updates       | 34067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.39    |
| time/              |          |
|    episodes        | 1136     |
|    fps             | 141      |
|    time_elapsed    | 1930     |
|    total_timesteps | 272640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.4     |
| time/              |          |
|    episodes        | 1140     |
|    fps             | 141      |
|    time_elapsed    | 1944     |
|    total_timesteps | 274560   |
| train/             |          |
|    actor_loss      | 7.14     |
|    critic_loss     | 0.00029  |
|    ent_coef        | 0.042    |
|    ent_coef_loss   | -0.0588  |
|    learning_rate   | 0.0003   |
|    n_updates       | 34307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.4     |
| time/              |          |
|    episodes        | 1144     |
|    fps             | 141      |
|    time_elapsed    | 1944     |
|    total_timesteps | 274560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.39    |
| time/              |          |
|    episodes        | 1148     |
|    fps             | 141      |
|    time_elapsed    | 1957     |
|    total_timesteps | 276480   |
| train/             |          |
|    actor_loss      | 7.13     |
|    critic_loss     | 0.000277 |
|    ent_coef        | 0.0419   |
|    ent_coef_loss   | 0.0706   |
|    learning_rate   | 0.0003   |
|    n_updates       | 34547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.39    |
| time/              |          |
|    episodes        | 1152     |
|    fps             | 141      |
|    time_elapsed    | 1957     |
|    total_timesteps | 276480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.4     |
| time/              |          |
|    episodes        | 1156     |
|    fps             | 141      |
|    time_elapsed    | 1970     |
|    total_timesteps | 278400   |
| train/             |          |
|    actor_loss      | 7.13     |
|    critic_loss     | 0.000232 |
|    ent_coef        | 0.0419   |
|    ent_coef_loss   | -0.0838  |
|    learning_rate   | 0.0003   |
|    n_updates       | 34787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.4     |
| time/              |          |
|    episodes        | 1160     |
|    fps             | 141      |
|    time_elapsed    | 1970     |
|    total_timesteps | 278400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.42    |
| time/              |          |
|    episodes        | 1164     |
|    fps             | 141      |
|    time_elapsed    | 1983     |
|    total_timesteps | 280320   |
| train/             |          |
|    actor_loss      | 7.14     |
|    critic_loss     | 0.000388 |
|    ent_coef        | 0.0418   |
|    ent_coef_loss   | 0.11     |
|    learning_rate   | 0.0003   |
|    n_updates       | 35027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.42    |
| time/              |          |
|    episodes        | 1168     |
|    fps             | 141      |
|    time_elapsed    | 1983     |
|    total_timesteps | 280320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1172     |
|    fps             | 141      |
|    time_elapsed    | 1997     |
|    total_timesteps | 282240   |
| train/             |          |
|    actor_loss      | 7.13     |
|    critic_loss     | 0.00106  |
|    ent_coef        | 0.0417   |
|    ent_coef_loss   | 0.0236   |
|    learning_rate   | 0.0003   |
|    n_updates       | 35267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1176     |
|    fps             | 141      |
|    time_elapsed    | 1997     |
|    total_timesteps | 282240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1180     |
|    fps             | 141      |
|    time_elapsed    | 2011     |
|    total_timesteps | 284160   |
| train/             |          |
|    actor_loss      | 7.11     |
|    critic_loss     | 0.000352 |
|    ent_coef        | 0.0418   |
|    ent_coef_loss   | 0.176    |
|    learning_rate   | 0.0003   |
|    n_updates       | 35507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1184     |
|    fps             | 141      |
|    time_elapsed    | 2011     |
|    total_timesteps | 284160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1188     |
|    fps             | 141      |
|    time_elapsed    | 2024     |
|    total_timesteps | 286080   |
| train/             |          |
|    actor_loss      | 7.1      |
|    critic_loss     | 0.000222 |
|    ent_coef        | 0.0418   |
|    ent_coef_loss   | -0.113   |
|    learning_rate   | 0.0003   |
|    n_updates       | 35747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1192     |
|    fps             | 141      |
|    time_elapsed    | 2024     |
|    total_timesteps | 286080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1196     |
|    fps             | 141      |
|    time_elapsed    | 2038     |
|    total_timesteps | 288000   |
| train/             |          |
|    actor_loss      | 7.09     |
|    critic_loss     | 0.000216 |
|    ent_coef        | 0.0417   |
|    ent_coef_loss   | 0.0759   |
|    learning_rate   | 0.0003   |
|    n_updates       | 35987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1200     |
|    fps             | 141      |
|    time_elapsed    | 2038     |
|    total_timesteps | 288000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.44    |
| time/              |          |
|    episodes        | 1204     |
|    fps             | 141      |
|    time_elapsed    | 2054     |
|    total_timesteps | 289920   |
| train/             |          |
|    actor_loss      | 7.09     |
|    critic_loss     | 0.000308 |
|    ent_coef        | 0.0417   |
|    ent_coef_loss   | -0.172   |
|    learning_rate   | 0.0003   |
|    n_updates       | 36227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.44    |
| time/              |          |
|    episodes        | 1208     |
|    fps             | 141      |
|    time_elapsed    | 2054     |
|    total_timesteps | 289920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1212     |
|    fps             | 141      |
|    time_elapsed    | 2068     |
|    total_timesteps | 291840   |
| train/             |          |
|    actor_loss      | 7.08     |
|    critic_loss     | 0.000277 |
|    ent_coef        | 0.0417   |
|    ent_coef_loss   | -0.0299  |
|    learning_rate   | 0.0003   |
|    n_updates       | 36467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.45    |
| time/              |          |
|    episodes        | 1216     |
|    fps             | 141      |
|    time_elapsed    | 2068     |
|    total_timesteps | 291840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1220     |
|    fps             | 141      |
|    time_elapsed    | 2082     |
|    total_timesteps | 293760   |
| train/             |          |
|    actor_loss      | 7.09     |
|    critic_loss     | 0.00029  |
|    ent_coef        | 0.0417   |
|    ent_coef_loss   | -0.0631  |
|    learning_rate   | 0.0003   |
|    n_updates       | 36707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1224     |
|    fps             | 141      |
|    time_elapsed    | 2082     |
|    total_timesteps | 293760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.49    |
| time/              |          |
|    episodes        | 1228     |
|    fps             | 140      |
|    time_elapsed    | 2097     |
|    total_timesteps | 295680   |
| train/             |          |
|    actor_loss      | 7.07     |
|    critic_loss     | 0.000261 |
|    ent_coef        | 0.0417   |
|    ent_coef_loss   | -0.122   |
|    learning_rate   | 0.0003   |
|    n_updates       | 36947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.49    |
| time/              |          |
|    episodes        | 1232     |
|    fps             | 140      |
|    time_elapsed    | 2097     |
|    total_timesteps | 295680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1236     |
|    fps             | 140      |
|    time_elapsed    | 2111     |
|    total_timesteps | 297600   |
| train/             |          |
|    actor_loss      | 7.07     |
|    critic_loss     | 0.000278 |
|    ent_coef        | 0.0416   |
|    ent_coef_loss   | 0.141    |
|    learning_rate   | 0.0003   |
|    n_updates       | 37187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.47    |
| time/              |          |
|    episodes        | 1240     |
|    fps             | 140      |
|    time_elapsed    | 2111     |
|    total_timesteps | 297600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1244     |
|    fps             | 140      |
|    time_elapsed    | 2125     |
|    total_timesteps | 299520   |
| train/             |          |
|    actor_loss      | 7.06     |
|    critic_loss     | 0.00029  |
|    ent_coef        | 0.0416   |
|    ent_coef_loss   | -0.224   |
|    learning_rate   | 0.0003   |
|    n_updates       | 37427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -5.46    |
| time/              |          |
|    episodes        | 1248     |
|    fps             | 140      |
|    time_elapsed    | 2125     |
|    total_timesteps | 299520   |
---------------------------------
In [92]:
def nii_path_from_X_and_h(cm, X_path, h_path, A_notional=100.0, L_notional=100.0,
                         tau_A=3.0, tau_L=1.0, dt=1/12):
    T = X_path.shape[0]
    NII = np.zeros(T-1)

    for t in range(T-1):
        X_t, X_next = X_path[t], X_path[t+1]

        yA = float(afns_yields_from_factors(X_t, np.array([tau_A]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])
        yL = float(afns_yields_from_factors(X_t, np.array([tau_L]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])

        # forward
        y1 = float(afns_yields_from_factors(X_t, np.array([tau_L]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])
        y2 = float(afns_yields_from_factors(X_t, np.array([tau_L + dt]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])
        K_t = ((tau_L + dt) * y2 - tau_L * y1) / dt

        y_float_next = float(afns_yields_from_factors(X_next, np.array([tau_L]), cm.lam, cm.sig1, cm.sig2, cm.sig3)[0])

        NII[t] = (A_notional*yA*dt - L_notional*yL*dt + h_path[t]*(K_t - y_float_next)*dt)

    return NII
In [93]:
def rollout_policy_on_exogenous_X(cm, X_path, policy,
                                  K_lq=None, rl_model=None,
                                  X_ref=None, center_state=True,
                                  action_max=1.0,
                                  A_notional=100.0, L_notional=100.0,
                                  tau_A=3.0, tau_L=1.0, dt=1/12):
    """
    policy: "unhedged" | "lq" | "rl"
    Returns dict with h, u, NII.
    """
    T = X_path.shape[0]
    h = np.zeros(T)
    u = np.zeros(T-1)

    for t in range(T-1):
        if policy == "unhedged":
            u_t = 0.0
        elif policy == "lq":
            if K_lq is None:
                raise ValueError("Need K_lq for LQ.")
            x_t = np.array([X_path[t,0], X_path[t,1], X_path[t,2], h[t]], dtype=float)
            u_t = -float(K_lq @ x_t)
            u_t = float(np.clip(u_t, -action_max, action_max))
        elif policy == "rl":
            if rl_model is None:
                raise ValueError("Need rl_model for RL.")
            obs = np.array([X_path[t,0], X_path[t,1], X_path[t,2], h[t]], dtype=np.float32)
            if center_state and (X_ref is not None):
                obs[:3] -= X_ref.astype(np.float32)
            act, _ = rl_model.predict(obs, deterministic=True)
            u_t = float(np.clip(float(act[0]), -action_max, action_max))
        else:
            raise ValueError("Unknown policy.")

        u[t] = u_t
        h[t+1] = h[t] + u_t

    NII = nii_path_from_X_and_h(cm, X_path, h, A_notional, L_notional, tau_A, tau_L, dt)
    return {"h": h, "u": u, "NII": NII}
In [94]:
def summarize(N):
    return {
        "std": float(np.std(N)),
        "p05": float(np.quantile(N, 0.05)),
        "min": float(np.min(N)),
        "mean": float(np.mean(N)),
    }

def compare_on_scenario(name, X_path, K_lq, rl_model):
    res0 = rollout_policy_on_exogenous_X(cm, X_path, "unhedged", action_max=1.0, X_ref=X_ref)
    resL = rollout_policy_on_exogenous_X(cm, X_path, "lq", K_lq=K_lq, action_max=1.0, X_ref=X_ref)
    resR = rollout_policy_on_exogenous_X(cm, X_path, "rl", rl_model=rl_model, action_max=1.0, X_ref=X_ref)

    row = {
        "scenario": name,
        "unhedged_std": summarize(res0["NII"])["std"],
        "lq_std": summarize(resL["NII"])["std"],
        "rl_std": summarize(resR["NII"])["std"],
        "unhedged_p05": summarize(res0["NII"])["p05"],
        "lq_p05": summarize(resL["NII"])["p05"],
        "rl_p05": summarize(resR["NII"])["p05"],
        "lq_turnover": float(np.mean(np.abs(resL["u"]))),
        "rl_turnover": float(np.mean(np.abs(resR["u"]))),
        "lq_inv": float(np.mean(np.abs(resL["h"]))),
        "rl_inv": float(np.mean(np.abs(resR["h"]))),
    }

    # plots
    import matplotlib.pyplot as plt
    plt.figure(figsize=(9,4))
    plt.plot(res0["NII"], label="Unhedged")
    plt.plot(resL["NII"], label="LQ")
    plt.plot(resR["NII"], label="RL (SAC)")
    plt.title(name)
    plt.xlabel("t (months)")
    plt.ylabel("NII")
    plt.legend()
    plt.tight_layout()
    plt.show()

    return row

# Build K_lq once using chosen lambdas and the same build_Hx_QR_from_nii + Riccati pipeline
H_x, Q_s, R = build_Hx_QR_from_nii(
    cm, X_ref, A_notional, L_notional, tau_A, tau_L, dt,
    alpha_nii=1.0, lambda_h=lambda_h_star, lambda_u=lambda_u_star
)
A_lq, B_lq = build_AB_from_Phi(Phi_hat)
P_lq, K_lq = solve_discrete_riccati(A_lq, B_lq, Q_s, R)

rows = []
rows.append(compare_on_scenario("Stress: +200bp parallel", X_parallel, K_lq, model))
rows.append(compare_on_scenario("Stress: bear steepener", X_steepen, K_lq, model))
rows.append(compare_on_scenario("Stress: high vol x3", X_highvol, K_lq, model))

import pandas as pd
df_compare = pd.DataFrame(rows)
df_compare
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  u_t = -float(K_lq @ x_t)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
Out[94]:
scenario unhedged_std lq_std rl_std unhedged_p05 lq_p05 rl_p05 lq_turnover rl_turnover lq_inv rl_inv
0 Stress: +200bp parallel 0.011522 0.010117 0.011566 0.021739 0.020354 0.021979 0.050177 0.005414 14.585011 1.776519
1 Stress: bear steepener 0.015246 0.013545 0.015247 0.021091 0.019772 0.021321 0.053618 0.005418 14.777920 1.569327
2 Stress: high vol x3 0.031582 0.028799 0.031851 -0.010008 -0.010316 -0.010063 0.090683 0.012188 13.971398 1.502951
In [95]:
compare_on_scenario("Stress: high vol x3", X_highvol, K_lq, model)
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  u_t = -float(K_lq @ x_t)
No description has been provided for this image
Out[95]:
{'scenario': 'Stress: high vol x3',
 'unhedged_std': 0.03158229651939372,
 'lq_std': 0.028799311337273092,
 'rl_std': 0.031850762210246804,
 'unhedged_p05': -0.01000792378107949,
 'lq_p05': -0.010316395757328036,
 'rl_p05': -0.010062760747148576,
 'lq_turnover': 0.09068329468290291,
 'rl_turnover': 0.01218796659398962,
 'lq_inv': 13.971398221022996,
 'rl_inv': 1.5029505758307546}

L1 transaction costs (bid–ask / fees): a realistic non-quadratic objective¶

We replace the quadratic turnover penalty with an L1 cost:

$ \text{Cost}_t = \text{NII}_{t+1}^2 + \lambda_h h_t^2 + \kappa_u |u_t| $

This change is small but economically meaningful:

  • trading a little still costs something (no smooth quadratic approximation),
  • optimal behavior often includes “no-trade” regions,
  • LQ is no longer optimal because the objective is not quadratic.

This is the key reason RL is included: it can learn sparse trading and inventory-aware behavior under realistic frictions.

In [96]:
class IrrbbNiiHedgeEnvL1(gym.Env):
    """
    IRRBB NII hedging environment with L1 transaction costs.

    State: [L, S, C, h]  (AFNS factors + hedge inventory)
    Action: u in [-1,1], scaled to Δh via u_scale

    Reward:
        r_t = - [ NII_{t+1}^2 + lambda_h * h_t^2 + kappa_u * |u_t| ]

    This breaks LQ assumptions and favors sparse trading.
    """

    metadata = {"render_modes": []}

    def __init__(
        self,
        cm,
        c_hat,
        Phi_hat,
        Sigma_hat,
        X_ref,
        A_notional=100.0,
        L_notional=100.0,
        tau_A=3.0,
        tau_L=1.0,
        dt=1 / 12,
        lambda_h=1e-6,
        kappa_u=3e-4,
        u_scale=25.0,
        no_trade_eps=0.0,
        episode_len=240,
        center_state=True,
        seed=123,
    ):
        super().__init__()

        self.cm = cm
        self.c = np.asarray(c_hat, float)
        self.Phi = np.asarray(Phi_hat, float)
        self.Sigma = np.asarray(Sigma_hat, float)
        self.X_ref = np.asarray(X_ref, float)

        self.A_notional = A_notional
        self.L_notional = L_notional
        self.tau_A = tau_A
        self.tau_L = tau_L
        self.dt = dt

        self.lambda_h = lambda_h
        self.kappa_u = kappa_u
        self.u_scale = u_scale
        self.no_trade_eps = no_trade_eps

        self.episode_len = episode_len
        self.center_state = center_state
        self.rng = np.random.default_rng(seed)

        # State: (L, S, C, h)
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(4,), dtype=np.float32
        )

        # Action: scalar in [-1, 1]
        self.action_space = spaces.Box(
            low=-1.0, high=1.0, shape=(1,), dtype=np.float32
        )

        self.t = 0
        self.X = None
        self.h = None

    # ---------- AFNS helpers ----------

    def _y(self, X, tau):
        return float(
            afns_yields_from_factors(
                X,
                np.array([tau]),
                self.cm.lam,
                self.cm.sig1,
                self.cm.sig2,
                self.cm.sig3,
            )[0]
        )

    def _fwd_cc(self, X, tau1, tau2):
        y1 = self._y(X, tau1)
        y2 = self._y(X, tau2)
        return (tau2 * y2 - tau1 * y1) / (tau2 - tau1)

    def _nii_step(self, X_t, X_next, h_t):
        yA = self._y(X_t, self.tau_A)
        yL = self._y(X_t, self.tau_L)

        K_t = self._fwd_cc(X_t, self.tau_L, self.tau_L + self.dt)
        y_float_next = self._y(X_next, self.tau_L)

        return (
            self.A_notional * yA * self.dt
            - self.L_notional * yL * self.dt
            + h_t * (K_t - y_float_next) * self.dt
        )

    # ---------- Gym API ----------

    def _obs(self):
        obs = np.array([self.X[0], self.X[1], self.X[2], self.h], dtype=np.float32)
        if self.center_state:
            obs[:3] -= self.X_ref.astype(np.float32)
        return obs

    def reset(self, *, seed=None, options=None):
        if seed is not None:
            self.rng = np.random.default_rng(seed)

        self.t = 0
        self.X = self.X_ref + self.rng.multivariate_normal(
            np.zeros(3), 0.1 * self.Sigma
        )
        self.h = 0.0
        return self._obs(), {}

    def step(self, action):
        u_raw = float(np.clip(action[0], -1.0, 1.0))
        u = u_raw * self.u_scale

        if abs(u) < self.no_trade_eps:
            u = 0.0

        # Next factors
        eps = self.rng.multivariate_normal(np.zeros(3), self.Sigma)
        X_next = self.c + self.Phi @ self.X + eps

        nii = self._nii_step(self.X, X_next, self.h)

        # Update hedge
        h_next = self.h + u

        # L1 reward
        reward = -(
            nii**2 + self.lambda_h * (self.h**2) + self.kappa_u * abs(u)
        )

        self.X = X_next
        self.h = h_next
        self.t += 1

        terminated = False
        truncated = self.t >= self.episode_len

        info = {"nii": nii, "h": self.h, "u": u}

        return self._obs(), float(reward), terminated, truncated, info
In [97]:
def train_sac_l1(
    cm,
    X_smooth,
    c_hat,
    Phi_hat,
    Sigma_hat,
    X_ref,
    lambda_h,
    kappa_u,
    total_timesteps=800_000,
    n_envs=8,
    u_scale=25.0,
    no_trade_eps=0.0,
    seed=123,
):
    def make_env():
        return IrrbbNiiHedgeEnvL1(
            cm=cm,
            c_hat=c_hat,
            Phi_hat=Phi_hat,
            Sigma_hat=Sigma_hat,
            X_ref=X_ref,
            lambda_h=lambda_h,
            kappa_u=kappa_u,
            u_scale=u_scale,
            no_trade_eps=no_trade_eps,
            seed=seed,
        )

    vec_env = make_vec_env(make_env, n_envs=n_envs)

    model = SAC(
        "MlpPolicy",
        vec_env,
        learning_rate=3e-4,
        batch_size=256,
        buffer_size=300_000,
        gamma=0.99,
        tau=0.005,
        train_freq=1,
        gradient_steps=1,
        verbose=1,
    )

    model.learn(total_timesteps=total_timesteps)
    return model
In [98]:
def eval_L1_cost(NII, h, u, lambda_h, kappa_u):
    """
    Average L1-based economic cost.
    """
    return float(
        np.mean(NII**2 + lambda_h * (h[:-1] ** 2) + kappa_u * np.abs(u))
    )
In [100]:
lambda_h_rl = lambda_h_star   # keep inventory discipline
kappa_u_rl  = 3e-4            # L1 trading cost
u_scale     = 25.0            # hedge impact scale
In [101]:
model_l1 = train_sac_l1(
    cm=cm,
    X_smooth=X_smooth,
    c_hat=c_hat,
    Phi_hat=Phi_hat,
    Sigma_hat=Sigma_hat,
    X_ref=X_ref,
    lambda_h=lambda_h_rl,
    kappa_u=kappa_u_rl,
    total_timesteps=800_000,   # increase to 1–2M if needed
    n_envs=8,
    u_scale=u_scale,
    no_trade_eps=0.0           # later try 0.5 or 1.0
)
Using cpu device
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -2.42    |
| time/              |          |
|    episodes        | 4        |
|    fps             | 148      |
|    time_elapsed    | 12       |
|    total_timesteps | 1920     |
| train/             |          |
|    actor_loss      | 0.77     |
|    critic_loss     | 0.272    |
|    ent_coef        | 0.934    |
|    ent_coef_loss   | -0.105   |
|    learning_rate   | 0.0003   |
|    n_updates       | 227      |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -2.42    |
| time/              |          |
|    episodes        | 8        |
|    fps             | 148      |
|    time_elapsed    | 12       |
|    total_timesteps | 1920     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -2.07    |
| time/              |          |
|    episodes        | 12       |
|    fps             | 144      |
|    time_elapsed    | 26       |
|    total_timesteps | 3840     |
| train/             |          |
|    actor_loss      | -0.389   |
|    critic_loss     | 0.0759   |
|    ent_coef        | 0.869    |
|    ent_coef_loss   | -0.234   |
|    learning_rate   | 0.0003   |
|    n_updates       | 467      |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -2.07    |
| time/              |          |
|    episodes        | 16       |
|    fps             | 144      |
|    time_elapsed    | 26       |
|    total_timesteps | 3840     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.93    |
| time/              |          |
|    episodes        | 20       |
|    fps             | 143      |
|    time_elapsed    | 40       |
|    total_timesteps | 5760     |
| train/             |          |
|    actor_loss      | -0.938   |
|    critic_loss     | 0.0514   |
|    ent_coef        | 0.809    |
|    ent_coef_loss   | -0.325   |
|    learning_rate   | 0.0003   |
|    n_updates       | 707      |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.93    |
| time/              |          |
|    episodes        | 24       |
|    fps             | 143      |
|    time_elapsed    | 40       |
|    total_timesteps | 5760     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.82    |
| time/              |          |
|    episodes        | 28       |
|    fps             | 143      |
|    time_elapsed    | 53       |
|    total_timesteps | 7680     |
| train/             |          |
|    actor_loss      | -1.49    |
|    critic_loss     | 0.0354   |
|    ent_coef        | 0.754    |
|    ent_coef_loss   | -0.46    |
|    learning_rate   | 0.0003   |
|    n_updates       | 947      |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.82    |
| time/              |          |
|    episodes        | 32       |
|    fps             | 143      |
|    time_elapsed    | 53       |
|    total_timesteps | 7680     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.77    |
| time/              |          |
|    episodes        | 36       |
|    fps             | 144      |
|    time_elapsed    | 66       |
|    total_timesteps | 9600     |
| train/             |          |
|    actor_loss      | -1.92    |
|    critic_loss     | 0.0337   |
|    ent_coef        | 0.701    |
|    ent_coef_loss   | -0.576   |
|    learning_rate   | 0.0003   |
|    n_updates       | 1187     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.77    |
| time/              |          |
|    episodes        | 40       |
|    fps             | 144      |
|    time_elapsed    | 66       |
|    total_timesteps | 9600     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 44       |
|    fps             | 143      |
|    time_elapsed    | 80       |
|    total_timesteps | 11520    |
| train/             |          |
|    actor_loss      | -2.31    |
|    critic_loss     | 0.0165   |
|    ent_coef        | 0.653    |
|    ent_coef_loss   | -0.677   |
|    learning_rate   | 0.0003   |
|    n_updates       | 1427     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 48       |
|    fps             | 143      |
|    time_elapsed    | 80       |
|    total_timesteps | 11520    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 52       |
|    fps             | 143      |
|    time_elapsed    | 93       |
|    total_timesteps | 13440    |
| train/             |          |
|    actor_loss      | -2.66    |
|    critic_loss     | 0.0179   |
|    ent_coef        | 0.607    |
|    ent_coef_loss   | -0.811   |
|    learning_rate   | 0.0003   |
|    n_updates       | 1667     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 56       |
|    fps             | 143      |
|    time_elapsed    | 93       |
|    total_timesteps | 13440    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 60       |
|    fps             | 143      |
|    time_elapsed    | 107      |
|    total_timesteps | 15360    |
| train/             |          |
|    actor_loss      | -2.95    |
|    critic_loss     | 0.0265   |
|    ent_coef        | 0.564    |
|    ent_coef_loss   | -0.935   |
|    learning_rate   | 0.0003   |
|    n_updates       | 1907     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 64       |
|    fps             | 143      |
|    time_elapsed    | 107      |
|    total_timesteps | 15360    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.64    |
| time/              |          |
|    episodes        | 68       |
|    fps             | 142      |
|    time_elapsed    | 121      |
|    total_timesteps | 17280    |
| train/             |          |
|    actor_loss      | -3.28    |
|    critic_loss     | 0.0394   |
|    ent_coef        | 0.525    |
|    ent_coef_loss   | -1.03    |
|    learning_rate   | 0.0003   |
|    n_updates       | 2147     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.64    |
| time/              |          |
|    episodes        | 72       |
|    fps             | 142      |
|    time_elapsed    | 121      |
|    total_timesteps | 17280    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.63    |
| time/              |          |
|    episodes        | 76       |
|    fps             | 142      |
|    time_elapsed    | 134      |
|    total_timesteps | 19200    |
| train/             |          |
|    actor_loss      | -3.64    |
|    critic_loss     | 0.0172   |
|    ent_coef        | 0.488    |
|    ent_coef_loss   | -1.18    |
|    learning_rate   | 0.0003   |
|    n_updates       | 2387     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.63    |
| time/              |          |
|    episodes        | 80       |
|    fps             | 142      |
|    time_elapsed    | 134      |
|    total_timesteps | 19200    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 84       |
|    fps             | 142      |
|    time_elapsed    | 147      |
|    total_timesteps | 21120    |
| train/             |          |
|    actor_loss      | -3.82    |
|    critic_loss     | 0.0178   |
|    ent_coef        | 0.454    |
|    ent_coef_loss   | -1.28    |
|    learning_rate   | 0.0003   |
|    n_updates       | 2627     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 88       |
|    fps             | 142      |
|    time_elapsed    | 147      |
|    total_timesteps | 21120    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.64    |
| time/              |          |
|    episodes        | 92       |
|    fps             | 142      |
|    time_elapsed    | 161      |
|    total_timesteps | 23040    |
| train/             |          |
|    actor_loss      | -4       |
|    critic_loss     | 0.0157   |
|    ent_coef        | 0.422    |
|    ent_coef_loss   | -1.4     |
|    learning_rate   | 0.0003   |
|    n_updates       | 2867     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.64    |
| time/              |          |
|    episodes        | 96       |
|    fps             | 142      |
|    time_elapsed    | 161      |
|    total_timesteps | 23040    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.61    |
| time/              |          |
|    episodes        | 100      |
|    fps             | 143      |
|    time_elapsed    | 174      |
|    total_timesteps | 24960    |
| train/             |          |
|    actor_loss      | -4.31    |
|    critic_loss     | 0.00948  |
|    ent_coef        | 0.393    |
|    ent_coef_loss   | -1.54    |
|    learning_rate   | 0.0003   |
|    n_updates       | 3107     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.61    |
| time/              |          |
|    episodes        | 104      |
|    fps             | 143      |
|    time_elapsed    | 174      |
|    total_timesteps | 24960    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.59    |
| time/              |          |
|    episodes        | 108      |
|    fps             | 143      |
|    time_elapsed    | 186      |
|    total_timesteps | 26880    |
| train/             |          |
|    actor_loss      | -4.39    |
|    critic_loss     | 0.0143   |
|    ent_coef        | 0.365    |
|    ent_coef_loss   | -1.64    |
|    learning_rate   | 0.0003   |
|    n_updates       | 3347     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.59    |
| time/              |          |
|    episodes        | 112      |
|    fps             | 143      |
|    time_elapsed    | 186      |
|    total_timesteps | 26880    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.62    |
| time/              |          |
|    episodes        | 116      |
|    fps             | 144      |
|    time_elapsed    | 199      |
|    total_timesteps | 28800    |
| train/             |          |
|    actor_loss      | -4.64    |
|    critic_loss     | 0.00457  |
|    ent_coef        | 0.339    |
|    ent_coef_loss   | -1.77    |
|    learning_rate   | 0.0003   |
|    n_updates       | 3587     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.62    |
| time/              |          |
|    episodes        | 120      |
|    fps             | 144      |
|    time_elapsed    | 199      |
|    total_timesteps | 28800    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.62    |
| time/              |          |
|    episodes        | 124      |
|    fps             | 144      |
|    time_elapsed    | 212      |
|    total_timesteps | 30720    |
| train/             |          |
|    actor_loss      | -4.72    |
|    critic_loss     | 0.0119   |
|    ent_coef        | 0.316    |
|    ent_coef_loss   | -1.88    |
|    learning_rate   | 0.0003   |
|    n_updates       | 3827     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.62    |
| time/              |          |
|    episodes        | 128      |
|    fps             | 144      |
|    time_elapsed    | 212      |
|    total_timesteps | 30720    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.62    |
| time/              |          |
|    episodes        | 132      |
|    fps             | 144      |
|    time_elapsed    | 226      |
|    total_timesteps | 32640    |
| train/             |          |
|    actor_loss      | -4.9     |
|    critic_loss     | 0.00466  |
|    ent_coef        | 0.294    |
|    ent_coef_loss   | -1.98    |
|    learning_rate   | 0.0003   |
|    n_updates       | 4067     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.62    |
| time/              |          |
|    episodes        | 136      |
|    fps             | 144      |
|    time_elapsed    | 226      |
|    total_timesteps | 32640    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 140      |
|    fps             | 143      |
|    time_elapsed    | 240      |
|    total_timesteps | 34560    |
| train/             |          |
|    actor_loss      | -5       |
|    critic_loss     | 0.0024   |
|    ent_coef        | 0.273    |
|    ent_coef_loss   | -2.16    |
|    learning_rate   | 0.0003   |
|    n_updates       | 4307     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 144      |
|    fps             | 143      |
|    time_elapsed    | 240      |
|    total_timesteps | 34560    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 148      |
|    fps             | 143      |
|    time_elapsed    | 253      |
|    total_timesteps | 36480    |
| train/             |          |
|    actor_loss      | -5.02    |
|    critic_loss     | 0.00781  |
|    ent_coef        | 0.254    |
|    ent_coef_loss   | -2.18    |
|    learning_rate   | 0.0003   |
|    n_updates       | 4547     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 152      |
|    fps             | 143      |
|    time_elapsed    | 253      |
|    total_timesteps | 36480    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 156      |
|    fps             | 144      |
|    time_elapsed    | 266      |
|    total_timesteps | 38400    |
| train/             |          |
|    actor_loss      | -5.2     |
|    critic_loss     | 0.00368  |
|    ent_coef        | 0.236    |
|    ent_coef_loss   | -2.3     |
|    learning_rate   | 0.0003   |
|    n_updates       | 4787     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 160      |
|    fps             | 144      |
|    time_elapsed    | 266      |
|    total_timesteps | 38400    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 164      |
|    fps             | 143      |
|    time_elapsed    | 280      |
|    total_timesteps | 40320    |
| train/             |          |
|    actor_loss      | -5.21    |
|    critic_loss     | 0.00507  |
|    ent_coef        | 0.22     |
|    ent_coef_loss   | -2.42    |
|    learning_rate   | 0.0003   |
|    n_updates       | 5027     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 168      |
|    fps             | 143      |
|    time_elapsed    | 280      |
|    total_timesteps | 40320    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.73    |
| time/              |          |
|    episodes        | 172      |
|    fps             | 143      |
|    time_elapsed    | 293      |
|    total_timesteps | 42240    |
| train/             |          |
|    actor_loss      | -5.33    |
|    critic_loss     | 0.00442  |
|    ent_coef        | 0.205    |
|    ent_coef_loss   | -2.6     |
|    learning_rate   | 0.0003   |
|    n_updates       | 5267     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.73    |
| time/              |          |
|    episodes        | 176      |
|    fps             | 143      |
|    time_elapsed    | 293      |
|    total_timesteps | 42240    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 180      |
|    fps             | 144      |
|    time_elapsed    | 305      |
|    total_timesteps | 44160    |
| train/             |          |
|    actor_loss      | -5.38    |
|    critic_loss     | 0.00173  |
|    ent_coef        | 0.19     |
|    ent_coef_loss   | -2.73    |
|    learning_rate   | 0.0003   |
|    n_updates       | 5507     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 184      |
|    fps             | 144      |
|    time_elapsed    | 305      |
|    total_timesteps | 44160    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 188      |
|    fps             | 147      |
|    time_elapsed    | 311      |
|    total_timesteps | 46080    |
| train/             |          |
|    actor_loss      | -5.43    |
|    critic_loss     | 0.0016   |
|    ent_coef        | 0.177    |
|    ent_coef_loss   | -2.86    |
|    learning_rate   | 0.0003   |
|    n_updates       | 5747     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 192      |
|    fps             | 147      |
|    time_elapsed    | 311      |
|    total_timesteps | 46080    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.73    |
| time/              |          |
|    episodes        | 196      |
|    fps             | 150      |
|    time_elapsed    | 318      |
|    total_timesteps | 48000    |
| train/             |          |
|    actor_loss      | -5.47    |
|    critic_loss     | 0.00188  |
|    ent_coef        | 0.165    |
|    ent_coef_loss   | -2.95    |
|    learning_rate   | 0.0003   |
|    n_updates       | 5987     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.73    |
| time/              |          |
|    episodes        | 200      |
|    fps             | 150      |
|    time_elapsed    | 318      |
|    total_timesteps | 48000    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.73    |
| time/              |          |
|    episodes        | 204      |
|    fps             | 152      |
|    time_elapsed    | 328      |
|    total_timesteps | 49920    |
| train/             |          |
|    actor_loss      | -5.5     |
|    critic_loss     | 0.00158  |
|    ent_coef        | 0.153    |
|    ent_coef_loss   | -3.06    |
|    learning_rate   | 0.0003   |
|    n_updates       | 6227     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.73    |
| time/              |          |
|    episodes        | 208      |
|    fps             | 152      |
|    time_elapsed    | 328      |
|    total_timesteps | 49920    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 212      |
|    fps             | 151      |
|    time_elapsed    | 342      |
|    total_timesteps | 51840    |
| train/             |          |
|    actor_loss      | -5.54    |
|    critic_loss     | 0.000549 |
|    ent_coef        | 0.143    |
|    ent_coef_loss   | -3.15    |
|    learning_rate   | 0.0003   |
|    n_updates       | 6467     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 216      |
|    fps             | 151      |
|    time_elapsed    | 342      |
|    total_timesteps | 51840    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 220      |
|    fps             | 152      |
|    time_elapsed    | 351      |
|    total_timesteps | 53760    |
| train/             |          |
|    actor_loss      | -5.53    |
|    critic_loss     | 0.00162  |
|    ent_coef        | 0.133    |
|    ent_coef_loss   | -3.29    |
|    learning_rate   | 0.0003   |
|    n_updates       | 6707     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 224      |
|    fps             | 152      |
|    time_elapsed    | 351      |
|    total_timesteps | 53760    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 228      |
|    fps             | 154      |
|    time_elapsed    | 360      |
|    total_timesteps | 55680    |
| train/             |          |
|    actor_loss      | -5.53    |
|    critic_loss     | 0.0054   |
|    ent_coef        | 0.123    |
|    ent_coef_loss   | -3.47    |
|    learning_rate   | 0.0003   |
|    n_updates       | 6947     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 232      |
|    fps             | 154      |
|    time_elapsed    | 360      |
|    total_timesteps | 55680    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 236      |
|    fps             | 156      |
|    time_elapsed    | 368      |
|    total_timesteps | 57600    |
| train/             |          |
|    actor_loss      | -5.5     |
|    critic_loss     | 0.00273  |
|    ent_coef        | 0.115    |
|    ent_coef_loss   | -3.6     |
|    learning_rate   | 0.0003   |
|    n_updates       | 7187     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 240      |
|    fps             | 156      |
|    time_elapsed    | 368      |
|    total_timesteps | 57600    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 244      |
|    fps             | 157      |
|    time_elapsed    | 376      |
|    total_timesteps | 59520    |
| train/             |          |
|    actor_loss      | -5.56    |
|    critic_loss     | 0.00132  |
|    ent_coef        | 0.107    |
|    ent_coef_loss   | -3.64    |
|    learning_rate   | 0.0003   |
|    n_updates       | 7427     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 248      |
|    fps             | 157      |
|    time_elapsed    | 376      |
|    total_timesteps | 59520    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 252      |
|    fps             | 159      |
|    time_elapsed    | 384      |
|    total_timesteps | 61440    |
| train/             |          |
|    actor_loss      | -5.53    |
|    critic_loss     | 0.00157  |
|    ent_coef        | 0.0994   |
|    ent_coef_loss   | -3.77    |
|    learning_rate   | 0.0003   |
|    n_updates       | 7667     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 256      |
|    fps             | 159      |
|    time_elapsed    | 384      |
|    total_timesteps | 61440    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 260      |
|    fps             | 162      |
|    time_elapsed    | 391      |
|    total_timesteps | 63360    |
| train/             |          |
|    actor_loss      | -5.5     |
|    critic_loss     | 0.00114  |
|    ent_coef        | 0.0925   |
|    ent_coef_loss   | -3.94    |
|    learning_rate   | 0.0003   |
|    n_updates       | 7907     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.7     |
| time/              |          |
|    episodes        | 264      |
|    fps             | 162      |
|    time_elapsed    | 391      |
|    total_timesteps | 63360    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 268      |
|    fps             | 164      |
|    time_elapsed    | 397      |
|    total_timesteps | 65280    |
| train/             |          |
|    actor_loss      | -5.47    |
|    critic_loss     | 0.000751 |
|    ent_coef        | 0.0861   |
|    ent_coef_loss   | -4.03    |
|    learning_rate   | 0.0003   |
|    n_updates       | 8147     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 272      |
|    fps             | 164      |
|    time_elapsed    | 397      |
|    total_timesteps | 65280    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 276      |
|    fps             | 166      |
|    time_elapsed    | 403      |
|    total_timesteps | 67200    |
| train/             |          |
|    actor_loss      | -5.48    |
|    critic_loss     | 0.00186  |
|    ent_coef        | 0.0801   |
|    ent_coef_loss   | -4.1     |
|    learning_rate   | 0.0003   |
|    n_updates       | 8387     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.71    |
| time/              |          |
|    episodes        | 280      |
|    fps             | 166      |
|    time_elapsed    | 403      |
|    total_timesteps | 67200    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.72    |
| time/              |          |
|    episodes        | 284      |
|    fps             | 168      |
|    time_elapsed    | 409      |
|    total_timesteps | 69120    |
| train/             |          |
|    actor_loss      | -5.46    |
|    critic_loss     | 0.000673 |
|    ent_coef        | 0.0745   |
|    ent_coef_loss   | -4.28    |
|    learning_rate   | 0.0003   |
|    n_updates       | 8627     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.72    |
| time/              |          |
|    episodes        | 288      |
|    fps             | 168      |
|    time_elapsed    | 409      |
|    total_timesteps | 69120    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 292      |
|    fps             | 170      |
|    time_elapsed    | 415      |
|    total_timesteps | 71040    |
| train/             |          |
|    actor_loss      | -5.42    |
|    critic_loss     | 0.00792  |
|    ent_coef        | 0.0694   |
|    ent_coef_loss   | -4.44    |
|    learning_rate   | 0.0003   |
|    n_updates       | 8867     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 296      |
|    fps             | 170      |
|    time_elapsed    | 415      |
|    total_timesteps | 71040    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 300      |
|    fps             | 172      |
|    time_elapsed    | 422      |
|    total_timesteps | 72960    |
| train/             |          |
|    actor_loss      | -5.36    |
|    critic_loss     | 0.000957 |
|    ent_coef        | 0.0646   |
|    ent_coef_loss   | -4.49    |
|    learning_rate   | 0.0003   |
|    n_updates       | 9107     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 304      |
|    fps             | 172      |
|    time_elapsed    | 422      |
|    total_timesteps | 72960    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 308      |
|    fps             | 174      |
|    time_elapsed    | 429      |
|    total_timesteps | 74880    |
| train/             |          |
|    actor_loss      | -5.3     |
|    critic_loss     | 0.0058   |
|    ent_coef        | 0.0601   |
|    ent_coef_loss   | -4.67    |
|    learning_rate   | 0.0003   |
|    n_updates       | 9347     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 312      |
|    fps             | 174      |
|    time_elapsed    | 429      |
|    total_timesteps | 74880    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.66    |
| time/              |          |
|    episodes        | 316      |
|    fps             | 176      |
|    time_elapsed    | 435      |
|    total_timesteps | 76800    |
| train/             |          |
|    actor_loss      | -5.27    |
|    critic_loss     | 0.0018   |
|    ent_coef        | 0.0559   |
|    ent_coef_loss   | -4.65    |
|    learning_rate   | 0.0003   |
|    n_updates       | 9587     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.66    |
| time/              |          |
|    episodes        | 320      |
|    fps             | 176      |
|    time_elapsed    | 435      |
|    total_timesteps | 76800    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 324      |
|    fps             | 178      |
|    time_elapsed    | 442      |
|    total_timesteps | 78720    |
| train/             |          |
|    actor_loss      | -5.25    |
|    critic_loss     | 0.00153  |
|    ent_coef        | 0.052    |
|    ent_coef_loss   | -4.82    |
|    learning_rate   | 0.0003   |
|    n_updates       | 9827     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 328      |
|    fps             | 178      |
|    time_elapsed    | 442      |
|    total_timesteps | 78720    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 332      |
|    fps             | 179      |
|    time_elapsed    | 448      |
|    total_timesteps | 80640    |
| train/             |          |
|    actor_loss      | -5.21    |
|    critic_loss     | 0.000667 |
|    ent_coef        | 0.0484   |
|    ent_coef_loss   | -4.96    |
|    learning_rate   | 0.0003   |
|    n_updates       | 10067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 336      |
|    fps             | 179      |
|    time_elapsed    | 448      |
|    total_timesteps | 80640    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 340      |
|    fps             | 181      |
|    time_elapsed    | 454      |
|    total_timesteps | 82560    |
| train/             |          |
|    actor_loss      | -5.21    |
|    critic_loss     | 0.00375  |
|    ent_coef        | 0.0451   |
|    ent_coef_loss   | -5.13    |
|    learning_rate   | 0.0003   |
|    n_updates       | 10307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.69    |
| time/              |          |
|    episodes        | 344      |
|    fps             | 181      |
|    time_elapsed    | 454      |
|    total_timesteps | 82560    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.66    |
| time/              |          |
|    episodes        | 348      |
|    fps             | 183      |
|    time_elapsed    | 460      |
|    total_timesteps | 84480    |
| train/             |          |
|    actor_loss      | -5.15    |
|    critic_loss     | 0.000259 |
|    ent_coef        | 0.0419   |
|    ent_coef_loss   | -5.26    |
|    learning_rate   | 0.0003   |
|    n_updates       | 10547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.66    |
| time/              |          |
|    episodes        | 352      |
|    fps             | 183      |
|    time_elapsed    | 460      |
|    total_timesteps | 84480    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 356      |
|    fps             | 184      |
|    time_elapsed    | 467      |
|    total_timesteps | 86400    |
| train/             |          |
|    actor_loss      | -5.11    |
|    critic_loss     | 0.000264 |
|    ent_coef        | 0.039    |
|    ent_coef_loss   | -5.29    |
|    learning_rate   | 0.0003   |
|    n_updates       | 10787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 360      |
|    fps             | 184      |
|    time_elapsed    | 467      |
|    total_timesteps | 86400    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 364      |
|    fps             | 186      |
|    time_elapsed    | 473      |
|    total_timesteps | 88320    |
| train/             |          |
|    actor_loss      | -4.98    |
|    critic_loss     | 0.00584  |
|    ent_coef        | 0.0363   |
|    ent_coef_loss   | -5.49    |
|    learning_rate   | 0.0003   |
|    n_updates       | 11027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.68    |
| time/              |          |
|    episodes        | 368      |
|    fps             | 186      |
|    time_elapsed    | 473      |
|    total_timesteps | 88320    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 372      |
|    fps             | 188      |
|    time_elapsed    | 479      |
|    total_timesteps | 90240    |
| train/             |          |
|    actor_loss      | -4.95    |
|    critic_loss     | 0.00339  |
|    ent_coef        | 0.0338   |
|    ent_coef_loss   | -5.59    |
|    learning_rate   | 0.0003   |
|    n_updates       | 11267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 376      |
|    fps             | 188      |
|    time_elapsed    | 479      |
|    total_timesteps | 90240    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 380      |
|    fps             | 189      |
|    time_elapsed    | 486      |
|    total_timesteps | 92160    |
| train/             |          |
|    actor_loss      | -4.95    |
|    critic_loss     | 0.00048  |
|    ent_coef        | 0.0314   |
|    ent_coef_loss   | -5.51    |
|    learning_rate   | 0.0003   |
|    n_updates       | 11507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 384      |
|    fps             | 189      |
|    time_elapsed    | 486      |
|    total_timesteps | 92160    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 388      |
|    fps             | 190      |
|    time_elapsed    | 494      |
|    total_timesteps | 94080    |
| train/             |          |
|    actor_loss      | -4.92    |
|    critic_loss     | 0.000432 |
|    ent_coef        | 0.0293   |
|    ent_coef_loss   | -5.72    |
|    learning_rate   | 0.0003   |
|    n_updates       | 11747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 392      |
|    fps             | 190      |
|    time_elapsed    | 494      |
|    total_timesteps | 94080    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 396      |
|    fps             | 190      |
|    time_elapsed    | 502      |
|    total_timesteps | 96000    |
| train/             |          |
|    actor_loss      | -4.86    |
|    critic_loss     | 0.00018  |
|    ent_coef        | 0.0272   |
|    ent_coef_loss   | -5.83    |
|    learning_rate   | 0.0003   |
|    n_updates       | 11987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.65    |
| time/              |          |
|    episodes        | 400      |
|    fps             | 190      |
|    time_elapsed    | 502      |
|    total_timesteps | 96000    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.63    |
| time/              |          |
|    episodes        | 404      |
|    fps             | 191      |
|    time_elapsed    | 510      |
|    total_timesteps | 97920    |
| train/             |          |
|    actor_loss      | -4.81    |
|    critic_loss     | 0.000426 |
|    ent_coef        | 0.0254   |
|    ent_coef_loss   | -5.89    |
|    learning_rate   | 0.0003   |
|    n_updates       | 12227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.63    |
| time/              |          |
|    episodes        | 408      |
|    fps             | 191      |
|    time_elapsed    | 510      |
|    total_timesteps | 97920    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 412      |
|    fps             | 192      |
|    time_elapsed    | 517      |
|    total_timesteps | 99840    |
| train/             |          |
|    actor_loss      | -4.76    |
|    critic_loss     | 0.00282  |
|    ent_coef        | 0.0236   |
|    ent_coef_loss   | -6.06    |
|    learning_rate   | 0.0003   |
|    n_updates       | 12467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 416      |
|    fps             | 192      |
|    time_elapsed    | 517      |
|    total_timesteps | 99840    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 420      |
|    fps             | 194      |
|    time_elapsed    | 524      |
|    total_timesteps | 101760   |
| train/             |          |
|    actor_loss      | -4.71    |
|    critic_loss     | 0.000315 |
|    ent_coef        | 0.022    |
|    ent_coef_loss   | -6.27    |
|    learning_rate   | 0.0003   |
|    n_updates       | 12707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.67    |
| time/              |          |
|    episodes        | 424      |
|    fps             | 194      |
|    time_elapsed    | 524      |
|    total_timesteps | 101760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.64    |
| time/              |          |
|    episodes        | 428      |
|    fps             | 195      |
|    time_elapsed    | 531      |
|    total_timesteps | 103680   |
| train/             |          |
|    actor_loss      | -4.62    |
|    critic_loss     | 0.0102   |
|    ent_coef        | 0.0205   |
|    ent_coef_loss   | -6.33    |
|    learning_rate   | 0.0003   |
|    n_updates       | 12947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.64    |
| time/              |          |
|    episodes        | 432      |
|    fps             | 195      |
|    time_elapsed    | 531      |
|    total_timesteps | 103680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.62    |
| time/              |          |
|    episodes        | 436      |
|    fps             | 195      |
|    time_elapsed    | 539      |
|    total_timesteps | 105600   |
| train/             |          |
|    actor_loss      | -4.6     |
|    critic_loss     | 0.000208 |
|    ent_coef        | 0.0191   |
|    ent_coef_loss   | -6.36    |
|    learning_rate   | 0.0003   |
|    n_updates       | 13187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.62    |
| time/              |          |
|    episodes        | 440      |
|    fps             | 195      |
|    time_elapsed    | 539      |
|    total_timesteps | 105600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.6     |
| time/              |          |
|    episodes        | 444      |
|    fps             | 196      |
|    time_elapsed    | 547      |
|    total_timesteps | 107520   |
| train/             |          |
|    actor_loss      | -4.54    |
|    critic_loss     | 0.000192 |
|    ent_coef        | 0.0177   |
|    ent_coef_loss   | -6.49    |
|    learning_rate   | 0.0003   |
|    n_updates       | 13427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.6     |
| time/              |          |
|    episodes        | 448      |
|    fps             | 196      |
|    time_elapsed    | 547      |
|    total_timesteps | 107520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.59    |
| time/              |          |
|    episodes        | 452      |
|    fps             | 196      |
|    time_elapsed    | 556      |
|    total_timesteps | 109440   |
| train/             |          |
|    actor_loss      | -4.5     |
|    critic_loss     | 0.0029   |
|    ent_coef        | 0.0165   |
|    ent_coef_loss   | -6.55    |
|    learning_rate   | 0.0003   |
|    n_updates       | 13667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.59    |
| time/              |          |
|    episodes        | 456      |
|    fps             | 196      |
|    time_elapsed    | 556      |
|    total_timesteps | 109440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.58    |
| time/              |          |
|    episodes        | 460      |
|    fps             | 196      |
|    time_elapsed    | 566      |
|    total_timesteps | 111360   |
| train/             |          |
|    actor_loss      | -4.42    |
|    critic_loss     | 0.00189  |
|    ent_coef        | 0.0154   |
|    ent_coef_loss   | -6.66    |
|    learning_rate   | 0.0003   |
|    n_updates       | 13907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.58    |
| time/              |          |
|    episodes        | 464      |
|    fps             | 196      |
|    time_elapsed    | 566      |
|    total_timesteps | 111360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.52    |
| time/              |          |
|    episodes        | 468      |
|    fps             | 196      |
|    time_elapsed    | 575      |
|    total_timesteps | 113280   |
| train/             |          |
|    actor_loss      | -4.4     |
|    critic_loss     | 0.000233 |
|    ent_coef        | 0.0143   |
|    ent_coef_loss   | -6.8     |
|    learning_rate   | 0.0003   |
|    n_updates       | 14147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.52    |
| time/              |          |
|    episodes        | 472      |
|    fps             | 196      |
|    time_elapsed    | 575      |
|    total_timesteps | 113280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.52    |
| time/              |          |
|    episodes        | 476      |
|    fps             | 197      |
|    time_elapsed    | 584      |
|    total_timesteps | 115200   |
| train/             |          |
|    actor_loss      | -4.34    |
|    critic_loss     | 0.000121 |
|    ent_coef        | 0.0133   |
|    ent_coef_loss   | -6.9     |
|    learning_rate   | 0.0003   |
|    n_updates       | 14387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.52    |
| time/              |          |
|    episodes        | 480      |
|    fps             | 197      |
|    time_elapsed    | 584      |
|    total_timesteps | 115200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.5     |
| time/              |          |
|    episodes        | 484      |
|    fps             | 197      |
|    time_elapsed    | 592      |
|    total_timesteps | 117120   |
| train/             |          |
|    actor_loss      | -4.27    |
|    critic_loss     | 0.000476 |
|    ent_coef        | 0.0124   |
|    ent_coef_loss   | -6.9     |
|    learning_rate   | 0.0003   |
|    n_updates       | 14627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.5     |
| time/              |          |
|    episodes        | 488      |
|    fps             | 197      |
|    time_elapsed    | 592      |
|    total_timesteps | 117120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.49    |
| time/              |          |
|    episodes        | 492      |
|    fps             | 198      |
|    time_elapsed    | 600      |
|    total_timesteps | 119040   |
| train/             |          |
|    actor_loss      | -4.22    |
|    critic_loss     | 0.00155  |
|    ent_coef        | 0.0116   |
|    ent_coef_loss   | -7.05    |
|    learning_rate   | 0.0003   |
|    n_updates       | 14867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.49    |
| time/              |          |
|    episodes        | 496      |
|    fps             | 198      |
|    time_elapsed    | 600      |
|    total_timesteps | 119040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.51    |
| time/              |          |
|    episodes        | 500      |
|    fps             | 198      |
|    time_elapsed    | 608      |
|    total_timesteps | 120960   |
| train/             |          |
|    actor_loss      | -4.19    |
|    critic_loss     | 0.000329 |
|    ent_coef        | 0.0108   |
|    ent_coef_loss   | -6.91    |
|    learning_rate   | 0.0003   |
|    n_updates       | 15107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.51    |
| time/              |          |
|    episodes        | 504      |
|    fps             | 198      |
|    time_elapsed    | 608      |
|    total_timesteps | 120960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.48    |
| time/              |          |
|    episodes        | 508      |
|    fps             | 199      |
|    time_elapsed    | 616      |
|    total_timesteps | 122880   |
| train/             |          |
|    actor_loss      | -4.13    |
|    critic_loss     | 0.000231 |
|    ent_coef        | 0.0101   |
|    ent_coef_loss   | -6.97    |
|    learning_rate   | 0.0003   |
|    n_updates       | 15347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.48    |
| time/              |          |
|    episodes        | 512      |
|    fps             | 199      |
|    time_elapsed    | 616      |
|    total_timesteps | 122880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.45    |
| time/              |          |
|    episodes        | 516      |
|    fps             | 199      |
|    time_elapsed    | 624      |
|    total_timesteps | 124800   |
| train/             |          |
|    actor_loss      | -4.08    |
|    critic_loss     | 0.000401 |
|    ent_coef        | 0.00939  |
|    ent_coef_loss   | -6.95    |
|    learning_rate   | 0.0003   |
|    n_updates       | 15587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.45    |
| time/              |          |
|    episodes        | 520      |
|    fps             | 199      |
|    time_elapsed    | 624      |
|    total_timesteps | 124800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.43    |
| time/              |          |
|    episodes        | 524      |
|    fps             | 200      |
|    time_elapsed    | 632      |
|    total_timesteps | 126720   |
| train/             |          |
|    actor_loss      | -4.01    |
|    critic_loss     | 0.000151 |
|    ent_coef        | 0.00875  |
|    ent_coef_loss   | -7.14    |
|    learning_rate   | 0.0003   |
|    n_updates       | 15827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.43    |
| time/              |          |
|    episodes        | 528      |
|    fps             | 200      |
|    time_elapsed    | 632      |
|    total_timesteps | 126720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.41    |
| time/              |          |
|    episodes        | 532      |
|    fps             | 200      |
|    time_elapsed    | 640      |
|    total_timesteps | 128640   |
| train/             |          |
|    actor_loss      | -3.97    |
|    critic_loss     | 0.000531 |
|    ent_coef        | 0.00815  |
|    ent_coef_loss   | -7.45    |
|    learning_rate   | 0.0003   |
|    n_updates       | 16067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.41    |
| time/              |          |
|    episodes        | 536      |
|    fps             | 200      |
|    time_elapsed    | 640      |
|    total_timesteps | 128640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.41    |
| time/              |          |
|    episodes        | 540      |
|    fps             | 201      |
|    time_elapsed    | 649      |
|    total_timesteps | 130560   |
| train/             |          |
|    actor_loss      | -3.92    |
|    critic_loss     | 0.000504 |
|    ent_coef        | 0.0076   |
|    ent_coef_loss   | -7.48    |
|    learning_rate   | 0.0003   |
|    n_updates       | 16307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.41    |
| time/              |          |
|    episodes        | 544      |
|    fps             | 201      |
|    time_elapsed    | 649      |
|    total_timesteps | 130560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.41    |
| time/              |          |
|    episodes        | 548      |
|    fps             | 201      |
|    time_elapsed    | 657      |
|    total_timesteps | 132480   |
| train/             |          |
|    actor_loss      | -3.88    |
|    critic_loss     | 0.000214 |
|    ent_coef        | 0.00708  |
|    ent_coef_loss   | -7.53    |
|    learning_rate   | 0.0003   |
|    n_updates       | 16547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.41    |
| time/              |          |
|    episodes        | 552      |
|    fps             | 201      |
|    time_elapsed    | 657      |
|    total_timesteps | 132480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.4     |
| time/              |          |
|    episodes        | 556      |
|    fps             | 201      |
|    time_elapsed    | 666      |
|    total_timesteps | 134400   |
| train/             |          |
|    actor_loss      | -3.83    |
|    critic_loss     | 0.000813 |
|    ent_coef        | 0.00661  |
|    ent_coef_loss   | -6.56    |
|    learning_rate   | 0.0003   |
|    n_updates       | 16787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.4     |
| time/              |          |
|    episodes        | 560      |
|    fps             | 201      |
|    time_elapsed    | 666      |
|    total_timesteps | 134400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.4     |
| time/              |          |
|    episodes        | 564      |
|    fps             | 201      |
|    time_elapsed    | 675      |
|    total_timesteps | 136320   |
| train/             |          |
|    actor_loss      | -3.76    |
|    critic_loss     | 0.00046  |
|    ent_coef        | 0.00618  |
|    ent_coef_loss   | -6.6     |
|    learning_rate   | 0.0003   |
|    n_updates       | 17027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.4     |
| time/              |          |
|    episodes        | 568      |
|    fps             | 201      |
|    time_elapsed    | 675      |
|    total_timesteps | 136320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.38    |
| time/              |          |
|    episodes        | 572      |
|    fps             | 202      |
|    time_elapsed    | 683      |
|    total_timesteps | 138240   |
| train/             |          |
|    actor_loss      | -3.71    |
|    critic_loss     | 0.000848 |
|    ent_coef        | 0.00578  |
|    ent_coef_loss   | -7.51    |
|    learning_rate   | 0.0003   |
|    n_updates       | 17267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.38    |
| time/              |          |
|    episodes        | 576      |
|    fps             | 202      |
|    time_elapsed    | 683      |
|    total_timesteps | 138240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.36    |
| time/              |          |
|    episodes        | 580      |
|    fps             | 202      |
|    time_elapsed    | 690      |
|    total_timesteps | 140160   |
| train/             |          |
|    actor_loss      | -3.67    |
|    critic_loss     | 0.000322 |
|    ent_coef        | 0.00539  |
|    ent_coef_loss   | -7.13    |
|    learning_rate   | 0.0003   |
|    n_updates       | 17507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.36    |
| time/              |          |
|    episodes        | 584      |
|    fps             | 202      |
|    time_elapsed    | 690      |
|    total_timesteps | 140160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.34    |
| time/              |          |
|    episodes        | 588      |
|    fps             | 203      |
|    time_elapsed    | 699      |
|    total_timesteps | 142080   |
| train/             |          |
|    actor_loss      | -3.62    |
|    critic_loss     | 0.000638 |
|    ent_coef        | 0.00503  |
|    ent_coef_loss   | -6.8     |
|    learning_rate   | 0.0003   |
|    n_updates       | 17747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.34    |
| time/              |          |
|    episodes        | 592      |
|    fps             | 203      |
|    time_elapsed    | 699      |
|    total_timesteps | 142080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.33    |
| time/              |          |
|    episodes        | 596      |
|    fps             | 203      |
|    time_elapsed    | 707      |
|    total_timesteps | 144000   |
| train/             |          |
|    actor_loss      | -3.57    |
|    critic_loss     | 0.000322 |
|    ent_coef        | 0.0047   |
|    ent_coef_loss   | -7.15    |
|    learning_rate   | 0.0003   |
|    n_updates       | 17987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.33    |
| time/              |          |
|    episodes        | 600      |
|    fps             | 203      |
|    time_elapsed    | 707      |
|    total_timesteps | 144000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.33    |
| time/              |          |
|    episodes        | 604      |
|    fps             | 204      |
|    time_elapsed    | 714      |
|    total_timesteps | 145920   |
| train/             |          |
|    actor_loss      | -3.52    |
|    critic_loss     | 0.000115 |
|    ent_coef        | 0.00439  |
|    ent_coef_loss   | -7.48    |
|    learning_rate   | 0.0003   |
|    n_updates       | 18227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.33    |
| time/              |          |
|    episodes        | 608      |
|    fps             | 204      |
|    time_elapsed    | 714      |
|    total_timesteps | 145920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.31    |
| time/              |          |
|    episodes        | 612      |
|    fps             | 204      |
|    time_elapsed    | 721      |
|    total_timesteps | 147840   |
| train/             |          |
|    actor_loss      | -3.47    |
|    critic_loss     | 0.000181 |
|    ent_coef        | 0.00409  |
|    ent_coef_loss   | -7.83    |
|    learning_rate   | 0.0003   |
|    n_updates       | 18467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.31    |
| time/              |          |
|    episodes        | 616      |
|    fps             | 204      |
|    time_elapsed    | 721      |
|    total_timesteps | 147840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.3     |
| time/              |          |
|    episodes        | 620      |
|    fps             | 205      |
|    time_elapsed    | 730      |
|    total_timesteps | 149760   |
| train/             |          |
|    actor_loss      | -3.43    |
|    critic_loss     | 0.000788 |
|    ent_coef        | 0.00382  |
|    ent_coef_loss   | -7.5     |
|    learning_rate   | 0.0003   |
|    n_updates       | 18707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.3     |
| time/              |          |
|    episodes        | 624      |
|    fps             | 205      |
|    time_elapsed    | 730      |
|    total_timesteps | 149760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.29    |
| time/              |          |
|    episodes        | 628      |
|    fps             | 204      |
|    time_elapsed    | 740      |
|    total_timesteps | 151680   |
| train/             |          |
|    actor_loss      | -3.39    |
|    critic_loss     | 0.000237 |
|    ent_coef        | 0.00356  |
|    ent_coef_loss   | -7.82    |
|    learning_rate   | 0.0003   |
|    n_updates       | 18947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.29    |
| time/              |          |
|    episodes        | 632      |
|    fps             | 204      |
|    time_elapsed    | 740      |
|    total_timesteps | 151680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.28    |
| time/              |          |
|    episodes        | 636      |
|    fps             | 205      |
|    time_elapsed    | 748      |
|    total_timesteps | 153600   |
| train/             |          |
|    actor_loss      | -3.35    |
|    critic_loss     | 0.00076  |
|    ent_coef        | 0.00332  |
|    ent_coef_loss   | -7.44    |
|    learning_rate   | 0.0003   |
|    n_updates       | 19187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.28    |
| time/              |          |
|    episodes        | 640      |
|    fps             | 205      |
|    time_elapsed    | 748      |
|    total_timesteps | 153600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.26    |
| time/              |          |
|    episodes        | 644      |
|    fps             | 205      |
|    time_elapsed    | 756      |
|    total_timesteps | 155520   |
| train/             |          |
|    actor_loss      | -3.29    |
|    critic_loss     | 0.000223 |
|    ent_coef        | 0.0031   |
|    ent_coef_loss   | -7.44    |
|    learning_rate   | 0.0003   |
|    n_updates       | 19427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.26    |
| time/              |          |
|    episodes        | 648      |
|    fps             | 205      |
|    time_elapsed    | 756      |
|    total_timesteps | 155520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.23    |
| time/              |          |
|    episodes        | 652      |
|    fps             | 205      |
|    time_elapsed    | 766      |
|    total_timesteps | 157440   |
| train/             |          |
|    actor_loss      | -3.25    |
|    critic_loss     | 0.000113 |
|    ent_coef        | 0.0029   |
|    ent_coef_loss   | -6.65    |
|    learning_rate   | 0.0003   |
|    n_updates       | 19667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.23    |
| time/              |          |
|    episodes        | 656      |
|    fps             | 205      |
|    time_elapsed    | 766      |
|    total_timesteps | 157440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.2     |
| time/              |          |
|    episodes        | 660      |
|    fps             | 205      |
|    time_elapsed    | 773      |
|    total_timesteps | 159360   |
| train/             |          |
|    actor_loss      | -3.2     |
|    critic_loss     | 0.000211 |
|    ent_coef        | 0.00272  |
|    ent_coef_loss   | -7.51    |
|    learning_rate   | 0.0003   |
|    n_updates       | 19907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.2     |
| time/              |          |
|    episodes        | 664      |
|    fps             | 205      |
|    time_elapsed    | 773      |
|    total_timesteps | 159360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.19    |
| time/              |          |
|    episodes        | 668      |
|    fps             | 206      |
|    time_elapsed    | 780      |
|    total_timesteps | 161280   |
| train/             |          |
|    actor_loss      | -3.16    |
|    critic_loss     | 0.00015  |
|    ent_coef        | 0.00254  |
|    ent_coef_loss   | -6.74    |
|    learning_rate   | 0.0003   |
|    n_updates       | 20147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.19    |
| time/              |          |
|    episodes        | 672      |
|    fps             | 206      |
|    time_elapsed    | 780      |
|    total_timesteps | 161280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.18    |
| time/              |          |
|    episodes        | 676      |
|    fps             | 207      |
|    time_elapsed    | 786      |
|    total_timesteps | 163200   |
| train/             |          |
|    actor_loss      | -3.09    |
|    critic_loss     | 0.000908 |
|    ent_coef        | 0.00238  |
|    ent_coef_loss   | -7.36    |
|    learning_rate   | 0.0003   |
|    n_updates       | 20387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.18    |
| time/              |          |
|    episodes        | 680      |
|    fps             | 207      |
|    time_elapsed    | 786      |
|    total_timesteps | 163200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.15    |
| time/              |          |
|    episodes        | 684      |
|    fps             | 208      |
|    time_elapsed    | 793      |
|    total_timesteps | 165120   |
| train/             |          |
|    actor_loss      | -3.07    |
|    critic_loss     | 0.000178 |
|    ent_coef        | 0.00223  |
|    ent_coef_loss   | -4.22    |
|    learning_rate   | 0.0003   |
|    n_updates       | 20627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.15    |
| time/              |          |
|    episodes        | 688      |
|    fps             | 208      |
|    time_elapsed    | 793      |
|    total_timesteps | 165120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.13    |
| time/              |          |
|    episodes        | 692      |
|    fps             | 208      |
|    time_elapsed    | 801      |
|    total_timesteps | 167040   |
| train/             |          |
|    actor_loss      | -3.02    |
|    critic_loss     | 0.000184 |
|    ent_coef        | 0.00213  |
|    ent_coef_loss   | -5.1     |
|    learning_rate   | 0.0003   |
|    n_updates       | 20867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.13    |
| time/              |          |
|    episodes        | 696      |
|    fps             | 208      |
|    time_elapsed    | 801      |
|    total_timesteps | 167040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.11    |
| time/              |          |
|    episodes        | 700      |
|    fps             | 209      |
|    time_elapsed    | 807      |
|    total_timesteps | 168960   |
| train/             |          |
|    actor_loss      | -2.98    |
|    critic_loss     | 0.000333 |
|    ent_coef        | 0.00202  |
|    ent_coef_loss   | -5.28    |
|    learning_rate   | 0.0003   |
|    n_updates       | 21107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.11    |
| time/              |          |
|    episodes        | 704      |
|    fps             | 209      |
|    time_elapsed    | 807      |
|    total_timesteps | 168960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.08    |
| time/              |          |
|    episodes        | 708      |
|    fps             | 209      |
|    time_elapsed    | 815      |
|    total_timesteps | 170880   |
| train/             |          |
|    actor_loss      | -2.94    |
|    critic_loss     | 0.000182 |
|    ent_coef        | 0.00189  |
|    ent_coef_loss   | -6.4     |
|    learning_rate   | 0.0003   |
|    n_updates       | 21347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.08    |
| time/              |          |
|    episodes        | 712      |
|    fps             | 209      |
|    time_elapsed    | 815      |
|    total_timesteps | 170880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.04    |
| time/              |          |
|    episodes        | 716      |
|    fps             | 210      |
|    time_elapsed    | 822      |
|    total_timesteps | 172800   |
| train/             |          |
|    actor_loss      | -2.9     |
|    critic_loss     | 0.000238 |
|    ent_coef        | 0.00177  |
|    ent_coef_loss   | -5.59    |
|    learning_rate   | 0.0003   |
|    n_updates       | 21587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.04    |
| time/              |          |
|    episodes        | 720      |
|    fps             | 210      |
|    time_elapsed    | 822      |
|    total_timesteps | 172800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.01    |
| time/              |          |
|    episodes        | 724      |
|    fps             | 210      |
|    time_elapsed    | 828      |
|    total_timesteps | 174720   |
| train/             |          |
|    actor_loss      | -2.86    |
|    critic_loss     | 0.000349 |
|    ent_coef        | 0.00165  |
|    ent_coef_loss   | -6.11    |
|    learning_rate   | 0.0003   |
|    n_updates       | 21827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1.01    |
| time/              |          |
|    episodes        | 728      |
|    fps             | 210      |
|    time_elapsed    | 828      |
|    total_timesteps | 174720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1       |
| time/              |          |
|    episodes        | 732      |
|    fps             | 211      |
|    time_elapsed    | 835      |
|    total_timesteps | 176640   |
| train/             |          |
|    actor_loss      | -2.83    |
|    critic_loss     | 0.000119 |
|    ent_coef        | 0.00154  |
|    ent_coef_loss   | -5.74    |
|    learning_rate   | 0.0003   |
|    n_updates       | 22067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -1       |
| time/              |          |
|    episodes        | 736      |
|    fps             | 211      |
|    time_elapsed    | 835      |
|    total_timesteps | 176640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.974   |
| time/              |          |
|    episodes        | 740      |
|    fps             | 211      |
|    time_elapsed    | 842      |
|    total_timesteps | 178560   |
| train/             |          |
|    actor_loss      | -2.76    |
|    critic_loss     | 0.00093  |
|    ent_coef        | 0.00144  |
|    ent_coef_loss   | -5.75    |
|    learning_rate   | 0.0003   |
|    n_updates       | 22307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.974   |
| time/              |          |
|    episodes        | 744      |
|    fps             | 211      |
|    time_elapsed    | 842      |
|    total_timesteps | 178560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.948   |
| time/              |          |
|    episodes        | 748      |
|    fps             | 212      |
|    time_elapsed    | 850      |
|    total_timesteps | 180480   |
| train/             |          |
|    actor_loss      | -2.75    |
|    critic_loss     | 0.000382 |
|    ent_coef        | 0.00134  |
|    ent_coef_loss   | -6.73    |
|    learning_rate   | 0.0003   |
|    n_updates       | 22547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.948   |
| time/              |          |
|    episodes        | 752      |
|    fps             | 212      |
|    time_elapsed    | 850      |
|    total_timesteps | 180480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.926   |
| time/              |          |
|    episodes        | 756      |
|    fps             | 212      |
|    time_elapsed    | 857      |
|    total_timesteps | 182400   |
| train/             |          |
|    actor_loss      | -2.7     |
|    critic_loss     | 0.00105  |
|    ent_coef        | 0.00126  |
|    ent_coef_loss   | -6.14    |
|    learning_rate   | 0.0003   |
|    n_updates       | 22787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.926   |
| time/              |          |
|    episodes        | 760      |
|    fps             | 212      |
|    time_elapsed    | 857      |
|    total_timesteps | 182400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.887   |
| time/              |          |
|    episodes        | 764      |
|    fps             | 213      |
|    time_elapsed    | 863      |
|    total_timesteps | 184320   |
| train/             |          |
|    actor_loss      | -2.66    |
|    critic_loss     | 0.000415 |
|    ent_coef        | 0.00118  |
|    ent_coef_loss   | -5.24    |
|    learning_rate   | 0.0003   |
|    n_updates       | 23027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.887   |
| time/              |          |
|    episodes        | 768      |
|    fps             | 213      |
|    time_elapsed    | 863      |
|    total_timesteps | 184320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.866   |
| time/              |          |
|    episodes        | 772      |
|    fps             | 213      |
|    time_elapsed    | 870      |
|    total_timesteps | 186240   |
| train/             |          |
|    actor_loss      | -2.64    |
|    critic_loss     | 0.000142 |
|    ent_coef        | 0.00111  |
|    ent_coef_loss   | -5.55    |
|    learning_rate   | 0.0003   |
|    n_updates       | 23267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.866   |
| time/              |          |
|    episodes        | 776      |
|    fps             | 213      |
|    time_elapsed    | 870      |
|    total_timesteps | 186240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.834   |
| time/              |          |
|    episodes        | 780      |
|    fps             | 214      |
|    time_elapsed    | 876      |
|    total_timesteps | 188160   |
| train/             |          |
|    actor_loss      | -2.59    |
|    critic_loss     | 0.000141 |
|    ent_coef        | 0.00105  |
|    ent_coef_loss   | -5.35    |
|    learning_rate   | 0.0003   |
|    n_updates       | 23507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.834   |
| time/              |          |
|    episodes        | 784      |
|    fps             | 214      |
|    time_elapsed    | 876      |
|    total_timesteps | 188160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.821   |
| time/              |          |
|    episodes        | 788      |
|    fps             | 214      |
|    time_elapsed    | 884      |
|    total_timesteps | 190080   |
| train/             |          |
|    actor_loss      | -2.57    |
|    critic_loss     | 0.000169 |
|    ent_coef        | 0.000988 |
|    ent_coef_loss   | -5.11    |
|    learning_rate   | 0.0003   |
|    n_updates       | 23747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.821   |
| time/              |          |
|    episodes        | 792      |
|    fps             | 214      |
|    time_elapsed    | 884      |
|    total_timesteps | 190080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.804   |
| time/              |          |
|    episodes        | 796      |
|    fps             | 215      |
|    time_elapsed    | 890      |
|    total_timesteps | 192000   |
| train/             |          |
|    actor_loss      | -2.53    |
|    critic_loss     | 0.000137 |
|    ent_coef        | 0.000932 |
|    ent_coef_loss   | -4.63    |
|    learning_rate   | 0.0003   |
|    n_updates       | 23987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.804   |
| time/              |          |
|    episodes        | 800      |
|    fps             | 215      |
|    time_elapsed    | 890      |
|    total_timesteps | 192000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.766   |
| time/              |          |
|    episodes        | 804      |
|    fps             | 216      |
|    time_elapsed    | 896      |
|    total_timesteps | 193920   |
| train/             |          |
|    actor_loss      | -2.49    |
|    critic_loss     | 0.000516 |
|    ent_coef        | 0.000882 |
|    ent_coef_loss   | -4.49    |
|    learning_rate   | 0.0003   |
|    n_updates       | 24227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.766   |
| time/              |          |
|    episodes        | 808      |
|    fps             | 216      |
|    time_elapsed    | 896      |
|    total_timesteps | 193920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.75    |
| time/              |          |
|    episodes        | 812      |
|    fps             | 216      |
|    time_elapsed    | 903      |
|    total_timesteps | 195840   |
| train/             |          |
|    actor_loss      | -2.46    |
|    critic_loss     | 0.000126 |
|    ent_coef        | 0.000832 |
|    ent_coef_loss   | -4.5     |
|    learning_rate   | 0.0003   |
|    n_updates       | 24467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.75    |
| time/              |          |
|    episodes        | 816      |
|    fps             | 216      |
|    time_elapsed    | 903      |
|    total_timesteps | 195840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.744   |
| time/              |          |
|    episodes        | 820      |
|    fps             | 217      |
|    time_elapsed    | 909      |
|    total_timesteps | 197760   |
| train/             |          |
|    actor_loss      | -2.42    |
|    critic_loss     | 0.000132 |
|    ent_coef        | 0.000784 |
|    ent_coef_loss   | -4.07    |
|    learning_rate   | 0.0003   |
|    n_updates       | 24707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.744   |
| time/              |          |
|    episodes        | 824      |
|    fps             | 217      |
|    time_elapsed    | 909      |
|    total_timesteps | 197760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.718   |
| time/              |          |
|    episodes        | 828      |
|    fps             | 217      |
|    time_elapsed    | 916      |
|    total_timesteps | 199680   |
| train/             |          |
|    actor_loss      | -2.39    |
|    critic_loss     | 0.000302 |
|    ent_coef        | 0.000739 |
|    ent_coef_loss   | -3.89    |
|    learning_rate   | 0.0003   |
|    n_updates       | 24947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.718   |
| time/              |          |
|    episodes        | 832      |
|    fps             | 217      |
|    time_elapsed    | 916      |
|    total_timesteps | 199680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.698   |
| time/              |          |
|    episodes        | 836      |
|    fps             | 218      |
|    time_elapsed    | 922      |
|    total_timesteps | 201600   |
| train/             |          |
|    actor_loss      | -2.35    |
|    critic_loss     | 0.000462 |
|    ent_coef        | 0.000699 |
|    ent_coef_loss   | -2.07    |
|    learning_rate   | 0.0003   |
|    n_updates       | 25187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.698   |
| time/              |          |
|    episodes        | 840      |
|    fps             | 218      |
|    time_elapsed    | 922      |
|    total_timesteps | 201600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 844      |
|    fps             | 218      |
|    time_elapsed    | 930      |
|    total_timesteps | 203520   |
| train/             |          |
|    actor_loss      | -2.32    |
|    critic_loss     | 0.000811 |
|    ent_coef        | 0.000661 |
|    ent_coef_loss   | -3.51    |
|    learning_rate   | 0.0003   |
|    n_updates       | 25427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 848      |
|    fps             | 218      |
|    time_elapsed    | 930      |
|    total_timesteps | 203520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.677   |
| time/              |          |
|    episodes        | 852      |
|    fps             | 218      |
|    time_elapsed    | 939      |
|    total_timesteps | 205440   |
| train/             |          |
|    actor_loss      | -2.28    |
|    critic_loss     | 0.00013  |
|    ent_coef        | 0.000627 |
|    ent_coef_loss   | -1.9     |
|    learning_rate   | 0.0003   |
|    n_updates       | 25667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.677   |
| time/              |          |
|    episodes        | 856      |
|    fps             | 218      |
|    time_elapsed    | 939      |
|    total_timesteps | 205440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.659   |
| time/              |          |
|    episodes        | 860      |
|    fps             | 218      |
|    time_elapsed    | 948      |
|    total_timesteps | 207360   |
| train/             |          |
|    actor_loss      | -2.24    |
|    critic_loss     | 0.00023  |
|    ent_coef        | 0.000599 |
|    ent_coef_loss   | -2.45    |
|    learning_rate   | 0.0003   |
|    n_updates       | 25907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.659   |
| time/              |          |
|    episodes        | 864      |
|    fps             | 218      |
|    time_elapsed    | 948      |
|    total_timesteps | 207360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.652   |
| time/              |          |
|    episodes        | 868      |
|    fps             | 218      |
|    time_elapsed    | 956      |
|    total_timesteps | 209280   |
| train/             |          |
|    actor_loss      | -2.22    |
|    critic_loss     | 0.00011  |
|    ent_coef        | 0.000573 |
|    ent_coef_loss   | -2.88    |
|    learning_rate   | 0.0003   |
|    n_updates       | 26147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.652   |
| time/              |          |
|    episodes        | 872      |
|    fps             | 218      |
|    time_elapsed    | 956      |
|    total_timesteps | 209280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.644   |
| time/              |          |
|    episodes        | 876      |
|    fps             | 218      |
|    time_elapsed    | 964      |
|    total_timesteps | 211200   |
| train/             |          |
|    actor_loss      | -2.19    |
|    critic_loss     | 0.000159 |
|    ent_coef        | 0.000546 |
|    ent_coef_loss   | -2.01    |
|    learning_rate   | 0.0003   |
|    n_updates       | 26387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.644   |
| time/              |          |
|    episodes        | 880      |
|    fps             | 218      |
|    time_elapsed    | 964      |
|    total_timesteps | 211200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.642   |
| time/              |          |
|    episodes        | 884      |
|    fps             | 219      |
|    time_elapsed    | 969      |
|    total_timesteps | 213120   |
| train/             |          |
|    actor_loss      | -2.17    |
|    critic_loss     | 0.000353 |
|    ent_coef        | 0.00052  |
|    ent_coef_loss   | -1.89    |
|    learning_rate   | 0.0003   |
|    n_updates       | 26627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.642   |
| time/              |          |
|    episodes        | 888      |
|    fps             | 219      |
|    time_elapsed    | 969      |
|    total_timesteps | 213120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.632   |
| time/              |          |
|    episodes        | 892      |
|    fps             | 220      |
|    time_elapsed    | 975      |
|    total_timesteps | 215040   |
| train/             |          |
|    actor_loss      | -2.12    |
|    critic_loss     | 0.000202 |
|    ent_coef        | 0.000497 |
|    ent_coef_loss   | -0.648   |
|    learning_rate   | 0.0003   |
|    n_updates       | 26867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.632   |
| time/              |          |
|    episodes        | 896      |
|    fps             | 220      |
|    time_elapsed    | 975      |
|    total_timesteps | 215040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.638   |
| time/              |          |
|    episodes        | 900      |
|    fps             | 220      |
|    time_elapsed    | 982      |
|    total_timesteps | 216960   |
| train/             |          |
|    actor_loss      | -2.1     |
|    critic_loss     | 0.000139 |
|    ent_coef        | 0.000476 |
|    ent_coef_loss   | -1.49    |
|    learning_rate   | 0.0003   |
|    n_updates       | 27107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.638   |
| time/              |          |
|    episodes        | 904      |
|    fps             | 220      |
|    time_elapsed    | 982      |
|    total_timesteps | 216960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.647   |
| time/              |          |
|    episodes        | 908      |
|    fps             | 221      |
|    time_elapsed    | 990      |
|    total_timesteps | 218880   |
| train/             |          |
|    actor_loss      | -2.06    |
|    critic_loss     | 0.000129 |
|    ent_coef        | 0.000459 |
|    ent_coef_loss   | -1.32    |
|    learning_rate   | 0.0003   |
|    n_updates       | 27347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.647   |
| time/              |          |
|    episodes        | 912      |
|    fps             | 221      |
|    time_elapsed    | 990      |
|    total_timesteps | 218880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.626   |
| time/              |          |
|    episodes        | 916      |
|    fps             | 221      |
|    time_elapsed    | 998      |
|    total_timesteps | 220800   |
| train/             |          |
|    actor_loss      | -2.04    |
|    critic_loss     | 0.000146 |
|    ent_coef        | 0.000443 |
|    ent_coef_loss   | -0.663   |
|    learning_rate   | 0.0003   |
|    n_updates       | 27587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.626   |
| time/              |          |
|    episodes        | 920      |
|    fps             | 221      |
|    time_elapsed    | 998      |
|    total_timesteps | 220800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 924      |
|    fps             | 221      |
|    time_elapsed    | 1006     |
|    total_timesteps | 222720   |
| train/             |          |
|    actor_loss      | -2.01    |
|    critic_loss     | 0.000143 |
|    ent_coef        | 0.000427 |
|    ent_coef_loss   | -1.93    |
|    learning_rate   | 0.0003   |
|    n_updates       | 27827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 928      |
|    fps             | 221      |
|    time_elapsed    | 1006     |
|    total_timesteps | 222720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.635   |
| time/              |          |
|    episodes        | 932      |
|    fps             | 221      |
|    time_elapsed    | 1014     |
|    total_timesteps | 224640   |
| train/             |          |
|    actor_loss      | -1.98    |
|    critic_loss     | 0.000162 |
|    ent_coef        | 0.000414 |
|    ent_coef_loss   | -0.58    |
|    learning_rate   | 0.0003   |
|    n_updates       | 28067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.635   |
| time/              |          |
|    episodes        | 936      |
|    fps             | 221      |
|    time_elapsed    | 1014     |
|    total_timesteps | 224640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.624   |
| time/              |          |
|    episodes        | 940      |
|    fps             | 221      |
|    time_elapsed    | 1022     |
|    total_timesteps | 226560   |
| train/             |          |
|    actor_loss      | -1.95    |
|    critic_loss     | 0.00107  |
|    ent_coef        | 0.0004   |
|    ent_coef_loss   | -1.88    |
|    learning_rate   | 0.0003   |
|    n_updates       | 28307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.624   |
| time/              |          |
|    episodes        | 944      |
|    fps             | 221      |
|    time_elapsed    | 1022     |
|    total_timesteps | 226560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.617   |
| time/              |          |
|    episodes        | 948      |
|    fps             | 221      |
|    time_elapsed    | 1031     |
|    total_timesteps | 228480   |
| train/             |          |
|    actor_loss      | -1.93    |
|    critic_loss     | 0.00048  |
|    ent_coef        | 0.000388 |
|    ent_coef_loss   | 0.723    |
|    learning_rate   | 0.0003   |
|    n_updates       | 28547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.617   |
| time/              |          |
|    episodes        | 952      |
|    fps             | 221      |
|    time_elapsed    | 1031     |
|    total_timesteps | 228480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.622   |
| time/              |          |
|    episodes        | 956      |
|    fps             | 222      |
|    time_elapsed    | 1037     |
|    total_timesteps | 230400   |
| train/             |          |
|    actor_loss      | -1.91    |
|    critic_loss     | 0.000223 |
|    ent_coef        | 0.000382 |
|    ent_coef_loss   | -0.965   |
|    learning_rate   | 0.0003   |
|    n_updates       | 28787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.622   |
| time/              |          |
|    episodes        | 960      |
|    fps             | 222      |
|    time_elapsed    | 1037     |
|    total_timesteps | 230400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 964      |
|    fps             | 222      |
|    time_elapsed    | 1043     |
|    total_timesteps | 232320   |
| train/             |          |
|    actor_loss      | -1.88    |
|    critic_loss     | 0.000165 |
|    ent_coef        | 0.000371 |
|    ent_coef_loss   | -0.373   |
|    learning_rate   | 0.0003   |
|    n_updates       | 29027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 968      |
|    fps             | 222      |
|    time_elapsed    | 1043     |
|    total_timesteps | 232320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.64    |
| time/              |          |
|    episodes        | 972      |
|    fps             | 223      |
|    time_elapsed    | 1048     |
|    total_timesteps | 234240   |
| train/             |          |
|    actor_loss      | -1.85    |
|    critic_loss     | 0.000381 |
|    ent_coef        | 0.00036  |
|    ent_coef_loss   | -0.816   |
|    learning_rate   | 0.0003   |
|    n_updates       | 29267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.64    |
| time/              |          |
|    episodes        | 976      |
|    fps             | 223      |
|    time_elapsed    | 1048     |
|    total_timesteps | 234240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.619   |
| time/              |          |
|    episodes        | 980      |
|    fps             | 223      |
|    time_elapsed    | 1057     |
|    total_timesteps | 236160   |
| train/             |          |
|    actor_loss      | -1.82    |
|    critic_loss     | 0.000243 |
|    ent_coef        | 0.000359 |
|    ent_coef_loss   | -0.813   |
|    learning_rate   | 0.0003   |
|    n_updates       | 29507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.619   |
| time/              |          |
|    episodes        | 984      |
|    fps             | 223      |
|    time_elapsed    | 1057     |
|    total_timesteps | 236160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.622   |
| time/              |          |
|    episodes        | 988      |
|    fps             | 223      |
|    time_elapsed    | 1067     |
|    total_timesteps | 238080   |
| train/             |          |
|    actor_loss      | -1.8     |
|    critic_loss     | 0.000154 |
|    ent_coef        | 0.000362 |
|    ent_coef_loss   | 0.458    |
|    learning_rate   | 0.0003   |
|    n_updates       | 29747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.622   |
| time/              |          |
|    episodes        | 992      |
|    fps             | 223      |
|    time_elapsed    | 1067     |
|    total_timesteps | 238080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.61    |
| time/              |          |
|    episodes        | 996      |
|    fps             | 206      |
|    time_elapsed    | 1163     |
|    total_timesteps | 240000   |
| train/             |          |
|    actor_loss      | -1.77    |
|    critic_loss     | 0.000208 |
|    ent_coef        | 0.000367 |
|    ent_coef_loss   | -0.147   |
|    learning_rate   | 0.0003   |
|    n_updates       | 29987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.61    |
| time/              |          |
|    episodes        | 1000     |
|    fps             | 206      |
|    time_elapsed    | 1163     |
|    total_timesteps | 240000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.594   |
| time/              |          |
|    episodes        | 1004     |
|    fps             | 205      |
|    time_elapsed    | 1179     |
|    total_timesteps | 241920   |
| train/             |          |
|    actor_loss      | -1.74    |
|    critic_loss     | 0.000218 |
|    ent_coef        | 0.00037  |
|    ent_coef_loss   | 0.498    |
|    learning_rate   | 0.0003   |
|    n_updates       | 30227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.594   |
| time/              |          |
|    episodes        | 1008     |
|    fps             | 205      |
|    time_elapsed    | 1179     |
|    total_timesteps | 241920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.58    |
| time/              |          |
|    episodes        | 1012     |
|    fps             | 204      |
|    time_elapsed    | 1190     |
|    total_timesteps | 243840   |
| train/             |          |
|    actor_loss      | -1.73    |
|    critic_loss     | 0.00015  |
|    ent_coef        | 0.000376 |
|    ent_coef_loss   | 0.94     |
|    learning_rate   | 0.0003   |
|    n_updates       | 30467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.58    |
| time/              |          |
|    episodes        | 1016     |
|    fps             | 204      |
|    time_elapsed    | 1190     |
|    total_timesteps | 243840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.585   |
| time/              |          |
|    episodes        | 1020     |
|    fps             | 204      |
|    time_elapsed    | 1200     |
|    total_timesteps | 245760   |
| train/             |          |
|    actor_loss      | -1.7     |
|    critic_loss     | 0.000166 |
|    ent_coef        | 0.000381 |
|    ent_coef_loss   | 0.524    |
|    learning_rate   | 0.0003   |
|    n_updates       | 30707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.585   |
| time/              |          |
|    episodes        | 1024     |
|    fps             | 204      |
|    time_elapsed    | 1200     |
|    total_timesteps | 245760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.572   |
| time/              |          |
|    episodes        | 1028     |
|    fps             | 204      |
|    time_elapsed    | 1210     |
|    total_timesteps | 247680   |
| train/             |          |
|    actor_loss      | -1.68    |
|    critic_loss     | 0.000161 |
|    ent_coef        | 0.000385 |
|    ent_coef_loss   | 0.326    |
|    learning_rate   | 0.0003   |
|    n_updates       | 30947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.572   |
| time/              |          |
|    episodes        | 1032     |
|    fps             | 204      |
|    time_elapsed    | 1210     |
|    total_timesteps | 247680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.577   |
| time/              |          |
|    episodes        | 1036     |
|    fps             | 204      |
|    time_elapsed    | 1220     |
|    total_timesteps | 249600   |
| train/             |          |
|    actor_loss      | -1.65    |
|    critic_loss     | 0.00014  |
|    ent_coef        | 0.00039  |
|    ent_coef_loss   | 0.769    |
|    learning_rate   | 0.0003   |
|    n_updates       | 31187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.577   |
| time/              |          |
|    episodes        | 1040     |
|    fps             | 204      |
|    time_elapsed    | 1220     |
|    total_timesteps | 249600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.61    |
| time/              |          |
|    episodes        | 1044     |
|    fps             | 203      |
|    time_elapsed    | 1233     |
|    total_timesteps | 251520   |
| train/             |          |
|    actor_loss      | -1.62    |
|    critic_loss     | 0.000202 |
|    ent_coef        | 0.000392 |
|    ent_coef_loss   | -0.204   |
|    learning_rate   | 0.0003   |
|    n_updates       | 31427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.61    |
| time/              |          |
|    episodes        | 1048     |
|    fps             | 203      |
|    time_elapsed    | 1233     |
|    total_timesteps | 251520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 1052     |
|    fps             | 204      |
|    time_elapsed    | 1242     |
|    total_timesteps | 253440   |
| train/             |          |
|    actor_loss      | -1.6     |
|    critic_loss     | 0.00111  |
|    ent_coef        | 0.000387 |
|    ent_coef_loss   | 0.612    |
|    learning_rate   | 0.0003   |
|    n_updates       | 31667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 1056     |
|    fps             | 204      |
|    time_elapsed    | 1242     |
|    total_timesteps | 253440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.626   |
| time/              |          |
|    episodes        | 1060     |
|    fps             | 204      |
|    time_elapsed    | 1251     |
|    total_timesteps | 255360   |
| train/             |          |
|    actor_loss      | -1.58    |
|    critic_loss     | 0.000352 |
|    ent_coef        | 0.000386 |
|    ent_coef_loss   | -1.06    |
|    learning_rate   | 0.0003   |
|    n_updates       | 31907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.626   |
| time/              |          |
|    episodes        | 1064     |
|    fps             | 204      |
|    time_elapsed    | 1251     |
|    total_timesteps | 255360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.615   |
| time/              |          |
|    episodes        | 1068     |
|    fps             | 204      |
|    time_elapsed    | 1260     |
|    total_timesteps | 257280   |
| train/             |          |
|    actor_loss      | -1.57    |
|    critic_loss     | 0.000213 |
|    ent_coef        | 0.000384 |
|    ent_coef_loss   | 0.34     |
|    learning_rate   | 0.0003   |
|    n_updates       | 32147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.615   |
| time/              |          |
|    episodes        | 1072     |
|    fps             | 204      |
|    time_elapsed    | 1260     |
|    total_timesteps | 257280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.62    |
| time/              |          |
|    episodes        | 1076     |
|    fps             | 204      |
|    time_elapsed    | 1269     |
|    total_timesteps | 259200   |
| train/             |          |
|    actor_loss      | -1.54    |
|    critic_loss     | 0.000223 |
|    ent_coef        | 0.000385 |
|    ent_coef_loss   | 0.315    |
|    learning_rate   | 0.0003   |
|    n_updates       | 32387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.62    |
| time/              |          |
|    episodes        | 1080     |
|    fps             | 204      |
|    time_elapsed    | 1269     |
|    total_timesteps | 259200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.642   |
| time/              |          |
|    episodes        | 1084     |
|    fps             | 204      |
|    time_elapsed    | 1278     |
|    total_timesteps | 261120   |
| train/             |          |
|    actor_loss      | -1.5     |
|    critic_loss     | 0.00015  |
|    ent_coef        | 0.000399 |
|    ent_coef_loss   | 2.59     |
|    learning_rate   | 0.0003   |
|    n_updates       | 32627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.642   |
| time/              |          |
|    episodes        | 1088     |
|    fps             | 204      |
|    time_elapsed    | 1278     |
|    total_timesteps | 261120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.647   |
| time/              |          |
|    episodes        | 1092     |
|    fps             | 204      |
|    time_elapsed    | 1287     |
|    total_timesteps | 263040   |
| train/             |          |
|    actor_loss      | -1.49    |
|    critic_loss     | 0.000124 |
|    ent_coef        | 0.00041  |
|    ent_coef_loss   | -0.728   |
|    learning_rate   | 0.0003   |
|    n_updates       | 32867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.647   |
| time/              |          |
|    episodes        | 1096     |
|    fps             | 204      |
|    time_elapsed    | 1287     |
|    total_timesteps | 263040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.666   |
| time/              |          |
|    episodes        | 1100     |
|    fps             | 203      |
|    time_elapsed    | 1301     |
|    total_timesteps | 264960   |
| train/             |          |
|    actor_loss      | -1.47    |
|    critic_loss     | 0.00015  |
|    ent_coef        | 0.000407 |
|    ent_coef_loss   | -0.348   |
|    learning_rate   | 0.0003   |
|    n_updates       | 33107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.666   |
| time/              |          |
|    episodes        | 1104     |
|    fps             | 203      |
|    time_elapsed    | 1301     |
|    total_timesteps | 264960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1108     |
|    fps             | 203      |
|    time_elapsed    | 1311     |
|    total_timesteps | 266880   |
| train/             |          |
|    actor_loss      | -1.45    |
|    critic_loss     | 0.000229 |
|    ent_coef        | 0.000407 |
|    ent_coef_loss   | -0.53    |
|    learning_rate   | 0.0003   |
|    n_updates       | 33347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1112     |
|    fps             | 203      |
|    time_elapsed    | 1311     |
|    total_timesteps | 266880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.687   |
| time/              |          |
|    episodes        | 1116     |
|    fps             | 202      |
|    time_elapsed    | 1325     |
|    total_timesteps | 268800   |
| train/             |          |
|    actor_loss      | -1.43    |
|    critic_loss     | 0.000137 |
|    ent_coef        | 0.000407 |
|    ent_coef_loss   | 0.109    |
|    learning_rate   | 0.0003   |
|    n_updates       | 33587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.687   |
| time/              |          |
|    episodes        | 1120     |
|    fps             | 202      |
|    time_elapsed    | 1325     |
|    total_timesteps | 268800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1124     |
|    fps             | 202      |
|    time_elapsed    | 1337     |
|    total_timesteps | 270720   |
| train/             |          |
|    actor_loss      | -1.4     |
|    critic_loss     | 0.000149 |
|    ent_coef        | 0.000407 |
|    ent_coef_loss   | 0.207    |
|    learning_rate   | 0.0003   |
|    n_updates       | 33827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1128     |
|    fps             | 202      |
|    time_elapsed    | 1337     |
|    total_timesteps | 270720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1132     |
|    fps             | 202      |
|    time_elapsed    | 1347     |
|    total_timesteps | 272640   |
| train/             |          |
|    actor_loss      | -1.39    |
|    critic_loss     | 0.000321 |
|    ent_coef        | 0.000405 |
|    ent_coef_loss   | -0.38    |
|    learning_rate   | 0.0003   |
|    n_updates       | 34067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1136     |
|    fps             | 202      |
|    time_elapsed    | 1347     |
|    total_timesteps | 272640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 1140     |
|    fps             | 202      |
|    time_elapsed    | 1356     |
|    total_timesteps | 274560   |
| train/             |          |
|    actor_loss      | -1.35    |
|    critic_loss     | 0.00013  |
|    ent_coef        | 0.000404 |
|    ent_coef_loss   | 0.514    |
|    learning_rate   | 0.0003   |
|    n_updates       | 34307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 1144     |
|    fps             | 202      |
|    time_elapsed    | 1356     |
|    total_timesteps | 274560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.672   |
| time/              |          |
|    episodes        | 1148     |
|    fps             | 202      |
|    time_elapsed    | 1366     |
|    total_timesteps | 276480   |
| train/             |          |
|    actor_loss      | -1.34    |
|    critic_loss     | 0.000265 |
|    ent_coef        | 0.000412 |
|    ent_coef_loss   | -0.0522  |
|    learning_rate   | 0.0003   |
|    n_updates       | 34547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.672   |
| time/              |          |
|    episodes        | 1152     |
|    fps             | 202      |
|    time_elapsed    | 1366     |
|    total_timesteps | 276480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.679   |
| time/              |          |
|    episodes        | 1156     |
|    fps             | 202      |
|    time_elapsed    | 1375     |
|    total_timesteps | 278400   |
| train/             |          |
|    actor_loss      | -1.32    |
|    critic_loss     | 0.000269 |
|    ent_coef        | 0.000418 |
|    ent_coef_loss   | -0.0604  |
|    learning_rate   | 0.0003   |
|    n_updates       | 34787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.679   |
| time/              |          |
|    episodes        | 1160     |
|    fps             | 202      |
|    time_elapsed    | 1375     |
|    total_timesteps | 278400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 1164     |
|    fps             | 201      |
|    time_elapsed    | 1394     |
|    total_timesteps | 280320   |
| train/             |          |
|    actor_loss      | -1.31    |
|    critic_loss     | 0.000275 |
|    ent_coef        | 0.000426 |
|    ent_coef_loss   | 0.00506  |
|    learning_rate   | 0.0003   |
|    n_updates       | 35027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 1168     |
|    fps             | 201      |
|    time_elapsed    | 1394     |
|    total_timesteps | 280320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.678   |
| time/              |          |
|    episodes        | 1172     |
|    fps             | 194      |
|    time_elapsed    | 1449     |
|    total_timesteps | 282240   |
| train/             |          |
|    actor_loss      | -1.27    |
|    critic_loss     | 0.000263 |
|    ent_coef        | 0.00044  |
|    ent_coef_loss   | -0.236   |
|    learning_rate   | 0.0003   |
|    n_updates       | 35267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.678   |
| time/              |          |
|    episodes        | 1176     |
|    fps             | 194      |
|    time_elapsed    | 1449     |
|    total_timesteps | 282240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.685   |
| time/              |          |
|    episodes        | 1180     |
|    fps             | 194      |
|    time_elapsed    | 1463     |
|    total_timesteps | 284160   |
| train/             |          |
|    actor_loss      | -1.27    |
|    critic_loss     | 0.000127 |
|    ent_coef        | 0.000453 |
|    ent_coef_loss   | -0.545   |
|    learning_rate   | 0.0003   |
|    n_updates       | 35507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.685   |
| time/              |          |
|    episodes        | 1184     |
|    fps             | 194      |
|    time_elapsed    | 1463     |
|    total_timesteps | 284160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 1188     |
|    fps             | 194      |
|    time_elapsed    | 1474     |
|    total_timesteps | 286080   |
| train/             |          |
|    actor_loss      | -1.25    |
|    critic_loss     | 0.000231 |
|    ent_coef        | 0.000456 |
|    ent_coef_loss   | -0.059   |
|    learning_rate   | 0.0003   |
|    n_updates       | 35747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 1192     |
|    fps             | 194      |
|    time_elapsed    | 1474     |
|    total_timesteps | 286080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 1196     |
|    fps             | 193      |
|    time_elapsed    | 1487     |
|    total_timesteps | 288000   |
| train/             |          |
|    actor_loss      | -1.22    |
|    critic_loss     | 0.00016  |
|    ent_coef        | 0.000456 |
|    ent_coef_loss   | 0.102    |
|    learning_rate   | 0.0003   |
|    n_updates       | 35987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 1200     |
|    fps             | 193      |
|    time_elapsed    | 1487     |
|    total_timesteps | 288000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 1204     |
|    fps             | 193      |
|    time_elapsed    | 1500     |
|    total_timesteps | 289920   |
| train/             |          |
|    actor_loss      | -1.21    |
|    critic_loss     | 0.000127 |
|    ent_coef        | 0.000467 |
|    ent_coef_loss   | -0.788   |
|    learning_rate   | 0.0003   |
|    n_updates       | 36227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 1208     |
|    fps             | 193      |
|    time_elapsed    | 1500     |
|    total_timesteps | 289920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.595   |
| time/              |          |
|    episodes        | 1212     |
|    fps             | 190      |
|    time_elapsed    | 1529     |
|    total_timesteps | 291840   |
| train/             |          |
|    actor_loss      | -1.19    |
|    critic_loss     | 0.000365 |
|    ent_coef        | 0.000475 |
|    ent_coef_loss   | 0.321    |
|    learning_rate   | 0.0003   |
|    n_updates       | 36467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.595   |
| time/              |          |
|    episodes        | 1216     |
|    fps             | 190      |
|    time_elapsed    | 1529     |
|    total_timesteps | 291840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.591   |
| time/              |          |
|    episodes        | 1220     |
|    fps             | 189      |
|    time_elapsed    | 1547     |
|    total_timesteps | 293760   |
| train/             |          |
|    actor_loss      | -1.17    |
|    critic_loss     | 0.000136 |
|    ent_coef        | 0.000482 |
|    ent_coef_loss   | 0.466    |
|    learning_rate   | 0.0003   |
|    n_updates       | 36707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.591   |
| time/              |          |
|    episodes        | 1224     |
|    fps             | 189      |
|    time_elapsed    | 1548     |
|    total_timesteps | 293760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.597   |
| time/              |          |
|    episodes        | 1228     |
|    fps             | 189      |
|    time_elapsed    | 1564     |
|    total_timesteps | 295680   |
| train/             |          |
|    actor_loss      | -1.15    |
|    critic_loss     | 0.000151 |
|    ent_coef        | 0.000486 |
|    ent_coef_loss   | 1.27     |
|    learning_rate   | 0.0003   |
|    n_updates       | 36947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.597   |
| time/              |          |
|    episodes        | 1232     |
|    fps             | 189      |
|    time_elapsed    | 1564     |
|    total_timesteps | 295680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.595   |
| time/              |          |
|    episodes        | 1236     |
|    fps             | 188      |
|    time_elapsed    | 1575     |
|    total_timesteps | 297600   |
| train/             |          |
|    actor_loss      | -1.12    |
|    critic_loss     | 0.000157 |
|    ent_coef        | 0.000478 |
|    ent_coef_loss   | 0.199    |
|    learning_rate   | 0.0003   |
|    n_updates       | 37187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.595   |
| time/              |          |
|    episodes        | 1240     |
|    fps             | 188      |
|    time_elapsed    | 1575     |
|    total_timesteps | 297600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.581   |
| time/              |          |
|    episodes        | 1244     |
|    fps             | 188      |
|    time_elapsed    | 1592     |
|    total_timesteps | 299520   |
| train/             |          |
|    actor_loss      | -1.11    |
|    critic_loss     | 0.00014  |
|    ent_coef        | 0.000479 |
|    ent_coef_loss   | 0.214    |
|    learning_rate   | 0.0003   |
|    n_updates       | 37427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.581   |
| time/              |          |
|    episodes        | 1248     |
|    fps             | 188      |
|    time_elapsed    | 1592     |
|    total_timesteps | 299520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.562   |
| time/              |          |
|    episodes        | 1252     |
|    fps             | 186      |
|    time_elapsed    | 1612     |
|    total_timesteps | 301440   |
| train/             |          |
|    actor_loss      | -1.1     |
|    critic_loss     | 0.000137 |
|    ent_coef        | 0.000477 |
|    ent_coef_loss   | -0.449   |
|    learning_rate   | 0.0003   |
|    n_updates       | 37667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.562   |
| time/              |          |
|    episodes        | 1256     |
|    fps             | 186      |
|    time_elapsed    | 1612     |
|    total_timesteps | 301440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.537   |
| time/              |          |
|    episodes        | 1260     |
|    fps             | 186      |
|    time_elapsed    | 1628     |
|    total_timesteps | 303360   |
| train/             |          |
|    actor_loss      | -1.09    |
|    critic_loss     | 0.000123 |
|    ent_coef        | 0.000483 |
|    ent_coef_loss   | -0.205   |
|    learning_rate   | 0.0003   |
|    n_updates       | 37907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.537   |
| time/              |          |
|    episodes        | 1264     |
|    fps             | 186      |
|    time_elapsed    | 1628     |
|    total_timesteps | 303360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.542   |
| time/              |          |
|    episodes        | 1268     |
|    fps             | 185      |
|    time_elapsed    | 1646     |
|    total_timesteps | 305280   |
| train/             |          |
|    actor_loss      | -1.06    |
|    critic_loss     | 0.000138 |
|    ent_coef        | 0.000488 |
|    ent_coef_loss   | -0.528   |
|    learning_rate   | 0.0003   |
|    n_updates       | 38147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.542   |
| time/              |          |
|    episodes        | 1272     |
|    fps             | 185      |
|    time_elapsed    | 1646     |
|    total_timesteps | 305280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.541   |
| time/              |          |
|    episodes        | 1276     |
|    fps             | 185      |
|    time_elapsed    | 1655     |
|    total_timesteps | 307200   |
| train/             |          |
|    actor_loss      | -1.05    |
|    critic_loss     | 0.000173 |
|    ent_coef        | 0.000497 |
|    ent_coef_loss   | -0.317   |
|    learning_rate   | 0.0003   |
|    n_updates       | 38387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.541   |
| time/              |          |
|    episodes        | 1280     |
|    fps             | 185      |
|    time_elapsed    | 1655     |
|    total_timesteps | 307200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.547   |
| time/              |          |
|    episodes        | 1284     |
|    fps             | 184      |
|    time_elapsed    | 1671     |
|    total_timesteps | 309120   |
| train/             |          |
|    actor_loss      | -1.03    |
|    critic_loss     | 0.00016  |
|    ent_coef        | 0.0005   |
|    ent_coef_loss   | 0.467    |
|    learning_rate   | 0.0003   |
|    n_updates       | 38627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.547   |
| time/              |          |
|    episodes        | 1288     |
|    fps             | 184      |
|    time_elapsed    | 1671     |
|    total_timesteps | 309120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.561   |
| time/              |          |
|    episodes        | 1292     |
|    fps             | 183      |
|    time_elapsed    | 1696     |
|    total_timesteps | 311040   |
| train/             |          |
|    actor_loss      | -1.01    |
|    critic_loss     | 0.000138 |
|    ent_coef        | 0.0005   |
|    ent_coef_loss   | 0.127    |
|    learning_rate   | 0.0003   |
|    n_updates       | 38867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.561   |
| time/              |          |
|    episodes        | 1296     |
|    fps             | 183      |
|    time_elapsed    | 1696     |
|    total_timesteps | 311040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.576   |
| time/              |          |
|    episodes        | 1300     |
|    fps             | 181      |
|    time_elapsed    | 1722     |
|    total_timesteps | 312960   |
| train/             |          |
|    actor_loss      | -0.994   |
|    critic_loss     | 0.0002   |
|    ent_coef        | 0.000502 |
|    ent_coef_loss   | -0.31    |
|    learning_rate   | 0.0003   |
|    n_updates       | 39107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.576   |
| time/              |          |
|    episodes        | 1304     |
|    fps             | 181      |
|    time_elapsed    | 1722     |
|    total_timesteps | 312960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 1308     |
|    fps             | 180      |
|    time_elapsed    | 1742     |
|    total_timesteps | 314880   |
| train/             |          |
|    actor_loss      | -0.977   |
|    critic_loss     | 0.000138 |
|    ent_coef        | 0.00048  |
|    ent_coef_loss   | -0.366   |
|    learning_rate   | 0.0003   |
|    n_updates       | 39347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 1312     |
|    fps             | 180      |
|    time_elapsed    | 1742     |
|    total_timesteps | 314880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.577   |
| time/              |          |
|    episodes        | 1316     |
|    fps             | 179      |
|    time_elapsed    | 1762     |
|    total_timesteps | 316800   |
| train/             |          |
|    actor_loss      | -0.964   |
|    critic_loss     | 0.000152 |
|    ent_coef        | 0.000458 |
|    ent_coef_loss   | 1.03     |
|    learning_rate   | 0.0003   |
|    n_updates       | 39587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.577   |
| time/              |          |
|    episodes        | 1320     |
|    fps             | 179      |
|    time_elapsed    | 1762     |
|    total_timesteps | 316800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.578   |
| time/              |          |
|    episodes        | 1324     |
|    fps             | 179      |
|    time_elapsed    | 1780     |
|    total_timesteps | 318720   |
| train/             |          |
|    actor_loss      | -0.96    |
|    critic_loss     | 0.000164 |
|    ent_coef        | 0.000458 |
|    ent_coef_loss   | 0.355    |
|    learning_rate   | 0.0003   |
|    n_updates       | 39827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.578   |
| time/              |          |
|    episodes        | 1328     |
|    fps             | 179      |
|    time_elapsed    | 1780     |
|    total_timesteps | 318720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.566   |
| time/              |          |
|    episodes        | 1332     |
|    fps             | 178      |
|    time_elapsed    | 1797     |
|    total_timesteps | 320640   |
| train/             |          |
|    actor_loss      | -0.928   |
|    critic_loss     | 0.000134 |
|    ent_coef        | 0.000463 |
|    ent_coef_loss   | 0.79     |
|    learning_rate   | 0.0003   |
|    n_updates       | 40067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.566   |
| time/              |          |
|    episodes        | 1336     |
|    fps             | 178      |
|    time_elapsed    | 1797     |
|    total_timesteps | 320640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.566   |
| time/              |          |
|    episodes        | 1340     |
|    fps             | 177      |
|    time_elapsed    | 1816     |
|    total_timesteps | 322560   |
| train/             |          |
|    actor_loss      | -0.921   |
|    critic_loss     | 0.000162 |
|    ent_coef        | 0.000475 |
|    ent_coef_loss   | -0.241   |
|    learning_rate   | 0.0003   |
|    n_updates       | 40307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.566   |
| time/              |          |
|    episodes        | 1344     |
|    fps             | 177      |
|    time_elapsed    | 1816     |
|    total_timesteps | 322560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.594   |
| time/              |          |
|    episodes        | 1348     |
|    fps             | 176      |
|    time_elapsed    | 1837     |
|    total_timesteps | 324480   |
| train/             |          |
|    actor_loss      | -0.906   |
|    critic_loss     | 0.000146 |
|    ent_coef        | 0.000474 |
|    ent_coef_loss   | -0.523   |
|    learning_rate   | 0.0003   |
|    n_updates       | 40547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.594   |
| time/              |          |
|    episodes        | 1352     |
|    fps             | 176      |
|    time_elapsed    | 1837     |
|    total_timesteps | 324480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.597   |
| time/              |          |
|    episodes        | 1356     |
|    fps             | 175      |
|    time_elapsed    | 1859     |
|    total_timesteps | 326400   |
| train/             |          |
|    actor_loss      | -0.883   |
|    critic_loss     | 0.000154 |
|    ent_coef        | 0.000472 |
|    ent_coef_loss   | -0.731   |
|    learning_rate   | 0.0003   |
|    n_updates       | 40787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.597   |
| time/              |          |
|    episodes        | 1360     |
|    fps             | 175      |
|    time_elapsed    | 1859     |
|    total_timesteps | 326400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.596   |
| time/              |          |
|    episodes        | 1364     |
|    fps             | 174      |
|    time_elapsed    | 1878     |
|    total_timesteps | 328320   |
| train/             |          |
|    actor_loss      | -0.872   |
|    critic_loss     | 0.000188 |
|    ent_coef        | 0.000471 |
|    ent_coef_loss   | 0.153    |
|    learning_rate   | 0.0003   |
|    n_updates       | 41027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.596   |
| time/              |          |
|    episodes        | 1368     |
|    fps             | 174      |
|    time_elapsed    | 1878     |
|    total_timesteps | 328320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.624   |
| time/              |          |
|    episodes        | 1372     |
|    fps             | 173      |
|    time_elapsed    | 1899     |
|    total_timesteps | 330240   |
| train/             |          |
|    actor_loss      | -0.865   |
|    critic_loss     | 0.00013  |
|    ent_coef        | 0.000475 |
|    ent_coef_loss   | 0.33     |
|    learning_rate   | 0.0003   |
|    n_updates       | 41267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.624   |
| time/              |          |
|    episodes        | 1376     |
|    fps             | 173      |
|    time_elapsed    | 1899     |
|    total_timesteps | 330240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.62    |
| time/              |          |
|    episodes        | 1380     |
|    fps             | 172      |
|    time_elapsed    | 1925     |
|    total_timesteps | 332160   |
| train/             |          |
|    actor_loss      | -0.845   |
|    critic_loss     | 0.000133 |
|    ent_coef        | 0.000482 |
|    ent_coef_loss   | 0.0183   |
|    learning_rate   | 0.0003   |
|    n_updates       | 41507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.62    |
| time/              |          |
|    episodes        | 1384     |
|    fps             | 172      |
|    time_elapsed    | 1925     |
|    total_timesteps | 332160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 1388     |
|    fps             | 171      |
|    time_elapsed    | 1945     |
|    total_timesteps | 334080   |
| train/             |          |
|    actor_loss      | -0.831   |
|    critic_loss     | 0.000133 |
|    ent_coef        | 0.000477 |
|    ent_coef_loss   | -0.128   |
|    learning_rate   | 0.0003   |
|    n_updates       | 41747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 1392     |
|    fps             | 171      |
|    time_elapsed    | 1945     |
|    total_timesteps | 334080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.614   |
| time/              |          |
|    episodes        | 1396     |
|    fps             | 170      |
|    time_elapsed    | 1968     |
|    total_timesteps | 336000   |
| train/             |          |
|    actor_loss      | -0.82    |
|    critic_loss     | 0.00014  |
|    ent_coef        | 0.000469 |
|    ent_coef_loss   | -0.19    |
|    learning_rate   | 0.0003   |
|    n_updates       | 41987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.614   |
| time/              |          |
|    episodes        | 1400     |
|    fps             | 170      |
|    time_elapsed    | 1968     |
|    total_timesteps | 336000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 1404     |
|    fps             | 169      |
|    time_elapsed    | 1990     |
|    total_timesteps | 337920   |
| train/             |          |
|    actor_loss      | -0.812   |
|    critic_loss     | 0.000195 |
|    ent_coef        | 0.000468 |
|    ent_coef_loss   | -0.109   |
|    learning_rate   | 0.0003   |
|    n_updates       | 42227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 1408     |
|    fps             | 169      |
|    time_elapsed    | 1990     |
|    total_timesteps | 337920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.617   |
| time/              |          |
|    episodes        | 1412     |
|    fps             | 168      |
|    time_elapsed    | 2011     |
|    total_timesteps | 339840   |
| train/             |          |
|    actor_loss      | -0.796   |
|    critic_loss     | 0.00021  |
|    ent_coef        | 0.000462 |
|    ent_coef_loss   | 0.0553   |
|    learning_rate   | 0.0003   |
|    n_updates       | 42467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.617   |
| time/              |          |
|    episodes        | 1416     |
|    fps             | 168      |
|    time_elapsed    | 2011     |
|    total_timesteps | 339840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.631   |
| time/              |          |
|    episodes        | 1420     |
|    fps             | 168      |
|    time_elapsed    | 2027     |
|    total_timesteps | 341760   |
| train/             |          |
|    actor_loss      | -0.776   |
|    critic_loss     | 0.000143 |
|    ent_coef        | 0.000465 |
|    ent_coef_loss   | 0.156    |
|    learning_rate   | 0.0003   |
|    n_updates       | 42707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.631   |
| time/              |          |
|    episodes        | 1424     |
|    fps             | 168      |
|    time_elapsed    | 2027     |
|    total_timesteps | 341760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.656   |
| time/              |          |
|    episodes        | 1428     |
|    fps             | 167      |
|    time_elapsed    | 2057     |
|    total_timesteps | 343680   |
| train/             |          |
|    actor_loss      | -0.768   |
|    critic_loss     | 0.000142 |
|    ent_coef        | 0.000462 |
|    ent_coef_loss   | -0.533   |
|    learning_rate   | 0.0003   |
|    n_updates       | 42947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.656   |
| time/              |          |
|    episodes        | 1432     |
|    fps             | 167      |
|    time_elapsed    | 2057     |
|    total_timesteps | 343680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 1436     |
|    fps             | 166      |
|    time_elapsed    | 2071     |
|    total_timesteps | 345600   |
| train/             |          |
|    actor_loss      | -0.757   |
|    critic_loss     | 0.000154 |
|    ent_coef        | 0.000452 |
|    ent_coef_loss   | -0.66    |
|    learning_rate   | 0.0003   |
|    n_updates       | 43187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 1440     |
|    fps             | 166      |
|    time_elapsed    | 2071     |
|    total_timesteps | 345600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 1444     |
|    fps             | 166      |
|    time_elapsed    | 2084     |
|    total_timesteps | 347520   |
| train/             |          |
|    actor_loss      | -0.739   |
|    critic_loss     | 0.000138 |
|    ent_coef        | 0.000446 |
|    ent_coef_loss   | -0.0561  |
|    learning_rate   | 0.0003   |
|    n_updates       | 43427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 1448     |
|    fps             | 166      |
|    time_elapsed    | 2084     |
|    total_timesteps | 347520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.642   |
| time/              |          |
|    episodes        | 1452     |
|    fps             | 165      |
|    time_elapsed    | 2110     |
|    total_timesteps | 349440   |
| train/             |          |
|    actor_loss      | -0.733   |
|    critic_loss     | 0.000172 |
|    ent_coef        | 0.000432 |
|    ent_coef_loss   | -0.553   |
|    learning_rate   | 0.0003   |
|    n_updates       | 43667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.642   |
| time/              |          |
|    episodes        | 1456     |
|    fps             | 165      |
|    time_elapsed    | 2110     |
|    total_timesteps | 349440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.663   |
| time/              |          |
|    episodes        | 1460     |
|    fps             | 164      |
|    time_elapsed    | 2133     |
|    total_timesteps | 351360   |
| train/             |          |
|    actor_loss      | -0.72    |
|    critic_loss     | 0.000119 |
|    ent_coef        | 0.000407 |
|    ent_coef_loss   | 0.237    |
|    learning_rate   | 0.0003   |
|    n_updates       | 43907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.663   |
| time/              |          |
|    episodes        | 1464     |
|    fps             | 164      |
|    time_elapsed    | 2133     |
|    total_timesteps | 351360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.635   |
| time/              |          |
|    episodes        | 1468     |
|    fps             | 164      |
|    time_elapsed    | 2149     |
|    total_timesteps | 353280   |
| train/             |          |
|    actor_loss      | -0.706   |
|    critic_loss     | 0.00018  |
|    ent_coef        | 0.000407 |
|    ent_coef_loss   | -0.0855  |
|    learning_rate   | 0.0003   |
|    n_updates       | 44147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.635   |
| time/              |          |
|    episodes        | 1472     |
|    fps             | 164      |
|    time_elapsed    | 2149     |
|    total_timesteps | 353280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.648   |
| time/              |          |
|    episodes        | 1476     |
|    fps             | 163      |
|    time_elapsed    | 2166     |
|    total_timesteps | 355200   |
| train/             |          |
|    actor_loss      | -0.698   |
|    critic_loss     | 0.000147 |
|    ent_coef        | 0.000406 |
|    ent_coef_loss   | -0.164   |
|    learning_rate   | 0.0003   |
|    n_updates       | 44387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.648   |
| time/              |          |
|    episodes        | 1480     |
|    fps             | 163      |
|    time_elapsed    | 2166     |
|    total_timesteps | 355200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.634   |
| time/              |          |
|    episodes        | 1484     |
|    fps             | 163      |
|    time_elapsed    | 2184     |
|    total_timesteps | 357120   |
| train/             |          |
|    actor_loss      | -0.684   |
|    critic_loss     | 0.000203 |
|    ent_coef        | 0.000398 |
|    ent_coef_loss   | 0.87     |
|    learning_rate   | 0.0003   |
|    n_updates       | 44627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.634   |
| time/              |          |
|    episodes        | 1488     |
|    fps             | 163      |
|    time_elapsed    | 2184     |
|    total_timesteps | 357120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.661   |
| time/              |          |
|    episodes        | 1492     |
|    fps             | 162      |
|    time_elapsed    | 2202     |
|    total_timesteps | 359040   |
| train/             |          |
|    actor_loss      | -0.66    |
|    critic_loss     | 0.000134 |
|    ent_coef        | 0.000391 |
|    ent_coef_loss   | -0.339   |
|    learning_rate   | 0.0003   |
|    n_updates       | 44867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.661   |
| time/              |          |
|    episodes        | 1496     |
|    fps             | 162      |
|    time_elapsed    | 2202     |
|    total_timesteps | 359040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 1500     |
|    fps             | 162      |
|    time_elapsed    | 2218     |
|    total_timesteps | 360960   |
| train/             |          |
|    actor_loss      | -0.652   |
|    critic_loss     | 0.000131 |
|    ent_coef        | 0.000402 |
|    ent_coef_loss   | 1.06     |
|    learning_rate   | 0.0003   |
|    n_updates       | 45107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 1504     |
|    fps             | 162      |
|    time_elapsed    | 2218     |
|    total_timesteps | 360960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.663   |
| time/              |          |
|    episodes        | 1508     |
|    fps             | 162      |
|    time_elapsed    | 2235     |
|    total_timesteps | 362880   |
| train/             |          |
|    actor_loss      | -0.649   |
|    critic_loss     | 0.000168 |
|    ent_coef        | 0.000404 |
|    ent_coef_loss   | 0.447    |
|    learning_rate   | 0.0003   |
|    n_updates       | 45347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.663   |
| time/              |          |
|    episodes        | 1512     |
|    fps             | 162      |
|    time_elapsed    | 2235     |
|    total_timesteps | 362880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1516     |
|    fps             | 161      |
|    time_elapsed    | 2254     |
|    total_timesteps | 364800   |
| train/             |          |
|    actor_loss      | -0.63    |
|    critic_loss     | 0.000174 |
|    ent_coef        | 0.000417 |
|    ent_coef_loss   | -0.336   |
|    learning_rate   | 0.0003   |
|    n_updates       | 45587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1520     |
|    fps             | 161      |
|    time_elapsed    | 2254     |
|    total_timesteps | 364800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1524     |
|    fps             | 161      |
|    time_elapsed    | 2271     |
|    total_timesteps | 366720   |
| train/             |          |
|    actor_loss      | -0.626   |
|    critic_loss     | 0.000151 |
|    ent_coef        | 0.00042  |
|    ent_coef_loss   | 0.178    |
|    learning_rate   | 0.0003   |
|    n_updates       | 45827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 1528     |
|    fps             | 161      |
|    time_elapsed    | 2271     |
|    total_timesteps | 366720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.68    |
| time/              |          |
|    episodes        | 1532     |
|    fps             | 160      |
|    time_elapsed    | 2291     |
|    total_timesteps | 368640   |
| train/             |          |
|    actor_loss      | -0.604   |
|    critic_loss     | 0.000169 |
|    ent_coef        | 0.000429 |
|    ent_coef_loss   | -0.458   |
|    learning_rate   | 0.0003   |
|    n_updates       | 46067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.68    |
| time/              |          |
|    episodes        | 1536     |
|    fps             | 160      |
|    time_elapsed    | 2291     |
|    total_timesteps | 368640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.694   |
| time/              |          |
|    episodes        | 1540     |
|    fps             | 160      |
|    time_elapsed    | 2315     |
|    total_timesteps | 370560   |
| train/             |          |
|    actor_loss      | -0.605   |
|    critic_loss     | 0.000138 |
|    ent_coef        | 0.000434 |
|    ent_coef_loss   | 0.347    |
|    learning_rate   | 0.0003   |
|    n_updates       | 46307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.694   |
| time/              |          |
|    episodes        | 1544     |
|    fps             | 160      |
|    time_elapsed    | 2315     |
|    total_timesteps | 370560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.717   |
| time/              |          |
|    episodes        | 1548     |
|    fps             | 159      |
|    time_elapsed    | 2335     |
|    total_timesteps | 372480   |
| train/             |          |
|    actor_loss      | -0.588   |
|    critic_loss     | 0.000138 |
|    ent_coef        | 0.00043  |
|    ent_coef_loss   | 0.0899   |
|    learning_rate   | 0.0003   |
|    n_updates       | 46547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.717   |
| time/              |          |
|    episodes        | 1552     |
|    fps             | 159      |
|    time_elapsed    | 2335     |
|    total_timesteps | 372480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.684   |
| time/              |          |
|    episodes        | 1556     |
|    fps             | 159      |
|    time_elapsed    | 2354     |
|    total_timesteps | 374400   |
| train/             |          |
|    actor_loss      | -0.583   |
|    critic_loss     | 0.000157 |
|    ent_coef        | 0.000423 |
|    ent_coef_loss   | -0.111   |
|    learning_rate   | 0.0003   |
|    n_updates       | 46787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.684   |
| time/              |          |
|    episodes        | 1560     |
|    fps             | 159      |
|    time_elapsed    | 2354     |
|    total_timesteps | 374400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.68    |
| time/              |          |
|    episodes        | 1564     |
|    fps             | 158      |
|    time_elapsed    | 2371     |
|    total_timesteps | 376320   |
| train/             |          |
|    actor_loss      | -0.566   |
|    critic_loss     | 0.000149 |
|    ent_coef        | 0.00042  |
|    ent_coef_loss   | 0.0573   |
|    learning_rate   | 0.0003   |
|    n_updates       | 47027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.68    |
| time/              |          |
|    episodes        | 1568     |
|    fps             | 158      |
|    time_elapsed    | 2371     |
|    total_timesteps | 376320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.677   |
| time/              |          |
|    episodes        | 1572     |
|    fps             | 157      |
|    time_elapsed    | 2396     |
|    total_timesteps | 378240   |
| train/             |          |
|    actor_loss      | -0.557   |
|    critic_loss     | 0.000153 |
|    ent_coef        | 0.000429 |
|    ent_coef_loss   | 0.361    |
|    learning_rate   | 0.0003   |
|    n_updates       | 47267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.677   |
| time/              |          |
|    episodes        | 1576     |
|    fps             | 157      |
|    time_elapsed    | 2396     |
|    total_timesteps | 378240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.677   |
| time/              |          |
|    episodes        | 1580     |
|    fps             | 157      |
|    time_elapsed    | 2414     |
|    total_timesteps | 380160   |
| train/             |          |
|    actor_loss      | -0.553   |
|    critic_loss     | 0.000141 |
|    ent_coef        | 0.000431 |
|    ent_coef_loss   | 0.394    |
|    learning_rate   | 0.0003   |
|    n_updates       | 47507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.677   |
| time/              |          |
|    episodes        | 1584     |
|    fps             | 157      |
|    time_elapsed    | 2414     |
|    total_timesteps | 380160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.678   |
| time/              |          |
|    episodes        | 1588     |
|    fps             | 157      |
|    time_elapsed    | 2429     |
|    total_timesteps | 382080   |
| train/             |          |
|    actor_loss      | -0.541   |
|    critic_loss     | 0.000176 |
|    ent_coef        | 0.000434 |
|    ent_coef_loss   | 0.367    |
|    learning_rate   | 0.0003   |
|    n_updates       | 47747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.678   |
| time/              |          |
|    episodes        | 1592     |
|    fps             | 157      |
|    time_elapsed    | 2429     |
|    total_timesteps | 382080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.651   |
| time/              |          |
|    episodes        | 1596     |
|    fps             | 156      |
|    time_elapsed    | 2453     |
|    total_timesteps | 384000   |
| train/             |          |
|    actor_loss      | -0.538   |
|    critic_loss     | 0.000154 |
|    ent_coef        | 0.000441 |
|    ent_coef_loss   | 0.583    |
|    learning_rate   | 0.0003   |
|    n_updates       | 47987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.651   |
| time/              |          |
|    episodes        | 1600     |
|    fps             | 156      |
|    time_elapsed    | 2453     |
|    total_timesteps | 384000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.641   |
| time/              |          |
|    episodes        | 1604     |
|    fps             | 156      |
|    time_elapsed    | 2468     |
|    total_timesteps | 385920   |
| train/             |          |
|    actor_loss      | -0.52    |
|    critic_loss     | 0.000147 |
|    ent_coef        | 0.000444 |
|    ent_coef_loss   | -0.347   |
|    learning_rate   | 0.0003   |
|    n_updates       | 48227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.641   |
| time/              |          |
|    episodes        | 1608     |
|    fps             | 156      |
|    time_elapsed    | 2468     |
|    total_timesteps | 385920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.675   |
| time/              |          |
|    episodes        | 1612     |
|    fps             | 155      |
|    time_elapsed    | 2492     |
|    total_timesteps | 387840   |
| train/             |          |
|    actor_loss      | -0.503   |
|    critic_loss     | 0.000144 |
|    ent_coef        | 0.000448 |
|    ent_coef_loss   | -0.476   |
|    learning_rate   | 0.0003   |
|    n_updates       | 48467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.675   |
| time/              |          |
|    episodes        | 1616     |
|    fps             | 155      |
|    time_elapsed    | 2492     |
|    total_timesteps | 387840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.662   |
| time/              |          |
|    episodes        | 1620     |
|    fps             | 155      |
|    time_elapsed    | 2514     |
|    total_timesteps | 389760   |
| train/             |          |
|    actor_loss      | -0.501   |
|    critic_loss     | 0.000129 |
|    ent_coef        | 0.000447 |
|    ent_coef_loss   | -0.998   |
|    learning_rate   | 0.0003   |
|    n_updates       | 48707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.662   |
| time/              |          |
|    episodes        | 1624     |
|    fps             | 155      |
|    time_elapsed    | 2514     |
|    total_timesteps | 389760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.627   |
| time/              |          |
|    episodes        | 1628     |
|    fps             | 154      |
|    time_elapsed    | 2532     |
|    total_timesteps | 391680   |
| train/             |          |
|    actor_loss      | -0.489   |
|    critic_loss     | 0.000146 |
|    ent_coef        | 0.000451 |
|    ent_coef_loss   | -0.241   |
|    learning_rate   | 0.0003   |
|    n_updates       | 48947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.627   |
| time/              |          |
|    episodes        | 1632     |
|    fps             | 154      |
|    time_elapsed    | 2532     |
|    total_timesteps | 391680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.626   |
| time/              |          |
|    episodes        | 1636     |
|    fps             | 154      |
|    time_elapsed    | 2543     |
|    total_timesteps | 393600   |
| train/             |          |
|    actor_loss      | -0.488   |
|    critic_loss     | 0.000142 |
|    ent_coef        | 0.000449 |
|    ent_coef_loss   | 0.081    |
|    learning_rate   | 0.0003   |
|    n_updates       | 49187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.626   |
| time/              |          |
|    episodes        | 1640     |
|    fps             | 154      |
|    time_elapsed    | 2543     |
|    total_timesteps | 393600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.631   |
| time/              |          |
|    episodes        | 1644     |
|    fps             | 154      |
|    time_elapsed    | 2563     |
|    total_timesteps | 395520   |
| train/             |          |
|    actor_loss      | -0.474   |
|    critic_loss     | 0.000156 |
|    ent_coef        | 0.00045  |
|    ent_coef_loss   | -0.224   |
|    learning_rate   | 0.0003   |
|    n_updates       | 49427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.631   |
| time/              |          |
|    episodes        | 1648     |
|    fps             | 154      |
|    time_elapsed    | 2563     |
|    total_timesteps | 395520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.621   |
| time/              |          |
|    episodes        | 1652     |
|    fps             | 153      |
|    time_elapsed    | 2582     |
|    total_timesteps | 397440   |
| train/             |          |
|    actor_loss      | -0.459   |
|    critic_loss     | 0.000123 |
|    ent_coef        | 0.000454 |
|    ent_coef_loss   | -0.0447  |
|    learning_rate   | 0.0003   |
|    n_updates       | 49667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.621   |
| time/              |          |
|    episodes        | 1656     |
|    fps             | 153      |
|    time_elapsed    | 2582     |
|    total_timesteps | 397440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.622   |
| time/              |          |
|    episodes        | 1660     |
|    fps             | 153      |
|    time_elapsed    | 2597     |
|    total_timesteps | 399360   |
| train/             |          |
|    actor_loss      | -0.454   |
|    critic_loss     | 0.000152 |
|    ent_coef        | 0.000458 |
|    ent_coef_loss   | -0.368   |
|    learning_rate   | 0.0003   |
|    n_updates       | 49907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.622   |
| time/              |          |
|    episodes        | 1664     |
|    fps             | 153      |
|    time_elapsed    | 2597     |
|    total_timesteps | 399360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.603   |
| time/              |          |
|    episodes        | 1668     |
|    fps             | 153      |
|    time_elapsed    | 2617     |
|    total_timesteps | 401280   |
| train/             |          |
|    actor_loss      | -0.444   |
|    critic_loss     | 0.000165 |
|    ent_coef        | 0.000458 |
|    ent_coef_loss   | 1.32     |
|    learning_rate   | 0.0003   |
|    n_updates       | 50147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.603   |
| time/              |          |
|    episodes        | 1672     |
|    fps             | 153      |
|    time_elapsed    | 2617     |
|    total_timesteps | 401280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.607   |
| time/              |          |
|    episodes        | 1676     |
|    fps             | 152      |
|    time_elapsed    | 2641     |
|    total_timesteps | 403200   |
| train/             |          |
|    actor_loss      | -0.437   |
|    critic_loss     | 0.000153 |
|    ent_coef        | 0.000468 |
|    ent_coef_loss   | 0.233    |
|    learning_rate   | 0.0003   |
|    n_updates       | 50387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.607   |
| time/              |          |
|    episodes        | 1680     |
|    fps             | 152      |
|    time_elapsed    | 2641     |
|    total_timesteps | 403200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.609   |
| time/              |          |
|    episodes        | 1684     |
|    fps             | 151      |
|    time_elapsed    | 2669     |
|    total_timesteps | 405120   |
| train/             |          |
|    actor_loss      | -0.431   |
|    critic_loss     | 0.000139 |
|    ent_coef        | 0.000469 |
|    ent_coef_loss   | 0.0106   |
|    learning_rate   | 0.0003   |
|    n_updates       | 50627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.609   |
| time/              |          |
|    episodes        | 1688     |
|    fps             | 151      |
|    time_elapsed    | 2669     |
|    total_timesteps | 405120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.593   |
| time/              |          |
|    episodes        | 1692     |
|    fps             | 151      |
|    time_elapsed    | 2686     |
|    total_timesteps | 407040   |
| train/             |          |
|    actor_loss      | -0.411   |
|    critic_loss     | 0.000146 |
|    ent_coef        | 0.000472 |
|    ent_coef_loss   | 0.559    |
|    learning_rate   | 0.0003   |
|    n_updates       | 50867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.593   |
| time/              |          |
|    episodes        | 1696     |
|    fps             | 151      |
|    time_elapsed    | 2686     |
|    total_timesteps | 407040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.6     |
| time/              |          |
|    episodes        | 1700     |
|    fps             | 151      |
|    time_elapsed    | 2706     |
|    total_timesteps | 408960   |
| train/             |          |
|    actor_loss      | -0.411   |
|    critic_loss     | 0.000128 |
|    ent_coef        | 0.000487 |
|    ent_coef_loss   | -0.827   |
|    learning_rate   | 0.0003   |
|    n_updates       | 51107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.6     |
| time/              |          |
|    episodes        | 1704     |
|    fps             | 151      |
|    time_elapsed    | 2706     |
|    total_timesteps | 408960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.597   |
| time/              |          |
|    episodes        | 1708     |
|    fps             | 150      |
|    time_elapsed    | 2725     |
|    total_timesteps | 410880   |
| train/             |          |
|    actor_loss      | -0.393   |
|    critic_loss     | 0.000159 |
|    ent_coef        | 0.000482 |
|    ent_coef_loss   | -0.573   |
|    learning_rate   | 0.0003   |
|    n_updates       | 51347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.597   |
| time/              |          |
|    episodes        | 1712     |
|    fps             | 150      |
|    time_elapsed    | 2725     |
|    total_timesteps | 410880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 1716     |
|    fps             | 150      |
|    time_elapsed    | 2742     |
|    total_timesteps | 412800   |
| train/             |          |
|    actor_loss      | -0.385   |
|    critic_loss     | 0.000148 |
|    ent_coef        | 0.000471 |
|    ent_coef_loss   | 0.955    |
|    learning_rate   | 0.0003   |
|    n_updates       | 51587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 1720     |
|    fps             | 150      |
|    time_elapsed    | 2742     |
|    total_timesteps | 412800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.588   |
| time/              |          |
|    episodes        | 1724     |
|    fps             | 150      |
|    time_elapsed    | 2759     |
|    total_timesteps | 414720   |
| train/             |          |
|    actor_loss      | -0.382   |
|    critic_loss     | 0.000138 |
|    ent_coef        | 0.000476 |
|    ent_coef_loss   | 0.035    |
|    learning_rate   | 0.0003   |
|    n_updates       | 51827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.588   |
| time/              |          |
|    episodes        | 1728     |
|    fps             | 150      |
|    time_elapsed    | 2759     |
|    total_timesteps | 414720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 1732     |
|    fps             | 149      |
|    time_elapsed    | 2777     |
|    total_timesteps | 416640   |
| train/             |          |
|    actor_loss      | -0.377   |
|    critic_loss     | 0.00015  |
|    ent_coef        | 0.000482 |
|    ent_coef_loss   | 0.182    |
|    learning_rate   | 0.0003   |
|    n_updates       | 52067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 1736     |
|    fps             | 149      |
|    time_elapsed    | 2777     |
|    total_timesteps | 416640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.59    |
| time/              |          |
|    episodes        | 1740     |
|    fps             | 149      |
|    time_elapsed    | 2793     |
|    total_timesteps | 418560   |
| train/             |          |
|    actor_loss      | -0.365   |
|    critic_loss     | 0.000133 |
|    ent_coef        | 0.000496 |
|    ent_coef_loss   | 0.921    |
|    learning_rate   | 0.0003   |
|    n_updates       | 52307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.59    |
| time/              |          |
|    episodes        | 1744     |
|    fps             | 149      |
|    time_elapsed    | 2793     |
|    total_timesteps | 418560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.581   |
| time/              |          |
|    episodes        | 1748     |
|    fps             | 149      |
|    time_elapsed    | 2810     |
|    total_timesteps | 420480   |
| train/             |          |
|    actor_loss      | -0.347   |
|    critic_loss     | 0.000186 |
|    ent_coef        | 0.000503 |
|    ent_coef_loss   | 0.841    |
|    learning_rate   | 0.0003   |
|    n_updates       | 52547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.581   |
| time/              |          |
|    episodes        | 1752     |
|    fps             | 149      |
|    time_elapsed    | 2810     |
|    total_timesteps | 420480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.579   |
| time/              |          |
|    episodes        | 1756     |
|    fps             | 149      |
|    time_elapsed    | 2825     |
|    total_timesteps | 422400   |
| train/             |          |
|    actor_loss      | -0.341   |
|    critic_loss     | 0.000154 |
|    ent_coef        | 0.000507 |
|    ent_coef_loss   | -0.291   |
|    learning_rate   | 0.0003   |
|    n_updates       | 52787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.579   |
| time/              |          |
|    episodes        | 1760     |
|    fps             | 149      |
|    time_elapsed    | 2825     |
|    total_timesteps | 422400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 1764     |
|    fps             | 149      |
|    time_elapsed    | 2838     |
|    total_timesteps | 424320   |
| train/             |          |
|    actor_loss      | -0.345   |
|    critic_loss     | 0.000164 |
|    ent_coef        | 0.000501 |
|    ent_coef_loss   | -0.219   |
|    learning_rate   | 0.0003   |
|    n_updates       | 53027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 1768     |
|    fps             | 149      |
|    time_elapsed    | 2838     |
|    total_timesteps | 424320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.594   |
| time/              |          |
|    episodes        | 1772     |
|    fps             | 149      |
|    time_elapsed    | 2855     |
|    total_timesteps | 426240   |
| train/             |          |
|    actor_loss      | -0.328   |
|    critic_loss     | 0.000162 |
|    ent_coef        | 0.0005   |
|    ent_coef_loss   | 0.381    |
|    learning_rate   | 0.0003   |
|    n_updates       | 53267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.594   |
| time/              |          |
|    episodes        | 1776     |
|    fps             | 149      |
|    time_elapsed    | 2855     |
|    total_timesteps | 426240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 1780     |
|    fps             | 149      |
|    time_elapsed    | 2872     |
|    total_timesteps | 428160   |
| train/             |          |
|    actor_loss      | -0.322   |
|    critic_loss     | 0.000125 |
|    ent_coef        | 0.000515 |
|    ent_coef_loss   | -0.714   |
|    learning_rate   | 0.0003   |
|    n_updates       | 53507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 1784     |
|    fps             | 149      |
|    time_elapsed    | 2872     |
|    total_timesteps | 428160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 1788     |
|    fps             | 148      |
|    time_elapsed    | 2888     |
|    total_timesteps | 430080   |
| train/             |          |
|    actor_loss      | -0.313   |
|    critic_loss     | 0.000174 |
|    ent_coef        | 0.000515 |
|    ent_coef_loss   | 0.675    |
|    learning_rate   | 0.0003   |
|    n_updates       | 53747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 1792     |
|    fps             | 148      |
|    time_elapsed    | 2888     |
|    total_timesteps | 430080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.605   |
| time/              |          |
|    episodes        | 1796     |
|    fps             | 148      |
|    time_elapsed    | 2911     |
|    total_timesteps | 432000   |
| train/             |          |
|    actor_loss      | -0.305   |
|    critic_loss     | 0.000153 |
|    ent_coef        | 0.000502 |
|    ent_coef_loss   | -0.334   |
|    learning_rate   | 0.0003   |
|    n_updates       | 53987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.605   |
| time/              |          |
|    episodes        | 1800     |
|    fps             | 148      |
|    time_elapsed    | 2911     |
|    total_timesteps | 432000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 1804     |
|    fps             | 148      |
|    time_elapsed    | 2927     |
|    total_timesteps | 433920   |
| train/             |          |
|    actor_loss      | -0.287   |
|    critic_loss     | 0.000152 |
|    ent_coef        | 0.000503 |
|    ent_coef_loss   | -0.482   |
|    learning_rate   | 0.0003   |
|    n_updates       | 54227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 1808     |
|    fps             | 148      |
|    time_elapsed    | 2927     |
|    total_timesteps | 433920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.595   |
| time/              |          |
|    episodes        | 1812     |
|    fps             | 147      |
|    time_elapsed    | 2947     |
|    total_timesteps | 435840   |
| train/             |          |
|    actor_loss      | -0.293   |
|    critic_loss     | 0.000125 |
|    ent_coef        | 0.000509 |
|    ent_coef_loss   | -1.37    |
|    learning_rate   | 0.0003   |
|    n_updates       | 54467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.595   |
| time/              |          |
|    episodes        | 1816     |
|    fps             | 147      |
|    time_elapsed    | 2948     |
|    total_timesteps | 435840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.608   |
| time/              |          |
|    episodes        | 1820     |
|    fps             | 147      |
|    time_elapsed    | 2969     |
|    total_timesteps | 437760   |
| train/             |          |
|    actor_loss      | -0.279   |
|    critic_loss     | 0.000164 |
|    ent_coef        | 0.000523 |
|    ent_coef_loss   | -0.229   |
|    learning_rate   | 0.0003   |
|    n_updates       | 54707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.608   |
| time/              |          |
|    episodes        | 1824     |
|    fps             | 147      |
|    time_elapsed    | 2969     |
|    total_timesteps | 437760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.588   |
| time/              |          |
|    episodes        | 1828     |
|    fps             | 147      |
|    time_elapsed    | 2986     |
|    total_timesteps | 439680   |
| train/             |          |
|    actor_loss      | -0.286   |
|    critic_loss     | 0.000148 |
|    ent_coef        | 0.000523 |
|    ent_coef_loss   | 0.184    |
|    learning_rate   | 0.0003   |
|    n_updates       | 54947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.588   |
| time/              |          |
|    episodes        | 1832     |
|    fps             | 147      |
|    time_elapsed    | 2986     |
|    total_timesteps | 439680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.587   |
| time/              |          |
|    episodes        | 1836     |
|    fps             | 147      |
|    time_elapsed    | 3001     |
|    total_timesteps | 441600   |
| train/             |          |
|    actor_loss      | -0.269   |
|    critic_loss     | 0.000148 |
|    ent_coef        | 0.000529 |
|    ent_coef_loss   | 0.522    |
|    learning_rate   | 0.0003   |
|    n_updates       | 55187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.587   |
| time/              |          |
|    episodes        | 1840     |
|    fps             | 147      |
|    time_elapsed    | 3001     |
|    total_timesteps | 441600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.615   |
| time/              |          |
|    episodes        | 1844     |
|    fps             | 147      |
|    time_elapsed    | 3016     |
|    total_timesteps | 443520   |
| train/             |          |
|    actor_loss      | -0.267   |
|    critic_loss     | 0.000169 |
|    ent_coef        | 0.000537 |
|    ent_coef_loss   | -0.0198  |
|    learning_rate   | 0.0003   |
|    n_updates       | 55427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.615   |
| time/              |          |
|    episodes        | 1848     |
|    fps             | 147      |
|    time_elapsed    | 3016     |
|    total_timesteps | 443520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.62    |
| time/              |          |
|    episodes        | 1852     |
|    fps             | 146      |
|    time_elapsed    | 3032     |
|    total_timesteps | 445440   |
| train/             |          |
|    actor_loss      | -0.256   |
|    critic_loss     | 0.00016  |
|    ent_coef        | 0.000522 |
|    ent_coef_loss   | 0.126    |
|    learning_rate   | 0.0003   |
|    n_updates       | 55667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.62    |
| time/              |          |
|    episodes        | 1856     |
|    fps             | 146      |
|    time_elapsed    | 3032     |
|    total_timesteps | 445440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.608   |
| time/              |          |
|    episodes        | 1860     |
|    fps             | 146      |
|    time_elapsed    | 3046     |
|    total_timesteps | 447360   |
| train/             |          |
|    actor_loss      | -0.244   |
|    critic_loss     | 0.000167 |
|    ent_coef        | 0.000522 |
|    ent_coef_loss   | -0.091   |
|    learning_rate   | 0.0003   |
|    n_updates       | 55907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.608   |
| time/              |          |
|    episodes        | 1864     |
|    fps             | 146      |
|    time_elapsed    | 3046     |
|    total_timesteps | 447360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.599   |
| time/              |          |
|    episodes        | 1868     |
|    fps             | 146      |
|    time_elapsed    | 3058     |
|    total_timesteps | 449280   |
| train/             |          |
|    actor_loss      | -0.246   |
|    critic_loss     | 0.000153 |
|    ent_coef        | 0.00052  |
|    ent_coef_loss   | 0.408    |
|    learning_rate   | 0.0003   |
|    n_updates       | 56147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.599   |
| time/              |          |
|    episodes        | 1872     |
|    fps             | 146      |
|    time_elapsed    | 3058     |
|    total_timesteps | 449280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.563   |
| time/              |          |
|    episodes        | 1876     |
|    fps             | 147      |
|    time_elapsed    | 3068     |
|    total_timesteps | 451200   |
| train/             |          |
|    actor_loss      | -0.227   |
|    critic_loss     | 0.000234 |
|    ent_coef        | 0.00052  |
|    ent_coef_loss   | 0.478    |
|    learning_rate   | 0.0003   |
|    n_updates       | 56387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.563   |
| time/              |          |
|    episodes        | 1880     |
|    fps             | 147      |
|    time_elapsed    | 3068     |
|    total_timesteps | 451200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.553   |
| time/              |          |
|    episodes        | 1884     |
|    fps             | 146      |
|    time_elapsed    | 3084     |
|    total_timesteps | 453120   |
| train/             |          |
|    actor_loss      | -0.235   |
|    critic_loss     | 0.000136 |
|    ent_coef        | 0.000516 |
|    ent_coef_loss   | 0.0237   |
|    learning_rate   | 0.0003   |
|    n_updates       | 56627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.553   |
| time/              |          |
|    episodes        | 1888     |
|    fps             | 146      |
|    time_elapsed    | 3084     |
|    total_timesteps | 453120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.561   |
| time/              |          |
|    episodes        | 1892     |
|    fps             | 146      |
|    time_elapsed    | 3100     |
|    total_timesteps | 455040   |
| train/             |          |
|    actor_loss      | -0.224   |
|    critic_loss     | 0.000161 |
|    ent_coef        | 0.000516 |
|    ent_coef_loss   | -0.335   |
|    learning_rate   | 0.0003   |
|    n_updates       | 56867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.561   |
| time/              |          |
|    episodes        | 1896     |
|    fps             | 146      |
|    time_elapsed    | 3100     |
|    total_timesteps | 455040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.565   |
| time/              |          |
|    episodes        | 1900     |
|    fps             | 146      |
|    time_elapsed    | 3128     |
|    total_timesteps | 456960   |
| train/             |          |
|    actor_loss      | -0.218   |
|    critic_loss     | 0.00015  |
|    ent_coef        | 0.000508 |
|    ent_coef_loss   | -0.563   |
|    learning_rate   | 0.0003   |
|    n_updates       | 57107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.565   |
| time/              |          |
|    episodes        | 1904     |
|    fps             | 146      |
|    time_elapsed    | 3128     |
|    total_timesteps | 456960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.56    |
| time/              |          |
|    episodes        | 1908     |
|    fps             | 145      |
|    time_elapsed    | 3151     |
|    total_timesteps | 458880   |
| train/             |          |
|    actor_loss      | -0.204   |
|    critic_loss     | 0.000181 |
|    ent_coef        | 0.000519 |
|    ent_coef_loss   | 0.29     |
|    learning_rate   | 0.0003   |
|    n_updates       | 57347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.56    |
| time/              |          |
|    episodes        | 1912     |
|    fps             | 145      |
|    time_elapsed    | 3151     |
|    total_timesteps | 458880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.56    |
| time/              |          |
|    episodes        | 1916     |
|    fps             | 145      |
|    time_elapsed    | 3174     |
|    total_timesteps | 460800   |
| train/             |          |
|    actor_loss      | -0.203   |
|    critic_loss     | 0.000202 |
|    ent_coef        | 0.000521 |
|    ent_coef_loss   | -0.0305  |
|    learning_rate   | 0.0003   |
|    n_updates       | 57587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.56    |
| time/              |          |
|    episodes        | 1920     |
|    fps             | 145      |
|    time_elapsed    | 3174     |
|    total_timesteps | 460800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.565   |
| time/              |          |
|    episodes        | 1924     |
|    fps             | 144      |
|    time_elapsed    | 3197     |
|    total_timesteps | 462720   |
| train/             |          |
|    actor_loss      | -0.203   |
|    critic_loss     | 0.000125 |
|    ent_coef        | 0.000511 |
|    ent_coef_loss   | 0.192    |
|    learning_rate   | 0.0003   |
|    n_updates       | 57827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.565   |
| time/              |          |
|    episodes        | 1928     |
|    fps             | 144      |
|    time_elapsed    | 3197     |
|    total_timesteps | 462720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.579   |
| time/              |          |
|    episodes        | 1932     |
|    fps             | 144      |
|    time_elapsed    | 3222     |
|    total_timesteps | 464640   |
| train/             |          |
|    actor_loss      | -0.176   |
|    critic_loss     | 0.00021  |
|    ent_coef        | 0.000512 |
|    ent_coef_loss   | -0.0791  |
|    learning_rate   | 0.0003   |
|    n_updates       | 58067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.579   |
| time/              |          |
|    episodes        | 1936     |
|    fps             | 144      |
|    time_elapsed    | 3222     |
|    total_timesteps | 464640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.549   |
| time/              |          |
|    episodes        | 1940     |
|    fps             | 143      |
|    time_elapsed    | 3244     |
|    total_timesteps | 466560   |
| train/             |          |
|    actor_loss      | -0.181   |
|    critic_loss     | 0.000203 |
|    ent_coef        | 0.000524 |
|    ent_coef_loss   | -0.012   |
|    learning_rate   | 0.0003   |
|    n_updates       | 58307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.549   |
| time/              |          |
|    episodes        | 1944     |
|    fps             | 143      |
|    time_elapsed    | 3244     |
|    total_timesteps | 466560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.532   |
| time/              |          |
|    episodes        | 1948     |
|    fps             | 143      |
|    time_elapsed    | 3265     |
|    total_timesteps | 468480   |
| train/             |          |
|    actor_loss      | -0.17    |
|    critic_loss     | 0.00014  |
|    ent_coef        | 0.000553 |
|    ent_coef_loss   | -0.506   |
|    learning_rate   | 0.0003   |
|    n_updates       | 58547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.532   |
| time/              |          |
|    episodes        | 1952     |
|    fps             | 143      |
|    time_elapsed    | 3265     |
|    total_timesteps | 468480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.534   |
| time/              |          |
|    episodes        | 1956     |
|    fps             | 143      |
|    time_elapsed    | 3288     |
|    total_timesteps | 470400   |
| train/             |          |
|    actor_loss      | -0.17    |
|    critic_loss     | 0.000136 |
|    ent_coef        | 0.000559 |
|    ent_coef_loss   | -0.81    |
|    learning_rate   | 0.0003   |
|    n_updates       | 58787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.534   |
| time/              |          |
|    episodes        | 1960     |
|    fps             | 143      |
|    time_elapsed    | 3288     |
|    total_timesteps | 470400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.548   |
| time/              |          |
|    episodes        | 1964     |
|    fps             | 142      |
|    time_elapsed    | 3310     |
|    total_timesteps | 472320   |
| train/             |          |
|    actor_loss      | -0.173   |
|    critic_loss     | 0.000163 |
|    ent_coef        | 0.000559 |
|    ent_coef_loss   | -0.505   |
|    learning_rate   | 0.0003   |
|    n_updates       | 59027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.548   |
| time/              |          |
|    episodes        | 1968     |
|    fps             | 142      |
|    time_elapsed    | 3310     |
|    total_timesteps | 472320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.548   |
| time/              |          |
|    episodes        | 1972     |
|    fps             | 142      |
|    time_elapsed    | 3333     |
|    total_timesteps | 474240   |
| train/             |          |
|    actor_loss      | -0.158   |
|    critic_loss     | 0.000185 |
|    ent_coef        | 0.000566 |
|    ent_coef_loss   | 0.85     |
|    learning_rate   | 0.0003   |
|    n_updates       | 59267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.548   |
| time/              |          |
|    episodes        | 1976     |
|    fps             | 142      |
|    time_elapsed    | 3333     |
|    total_timesteps | 474240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.563   |
| time/              |          |
|    episodes        | 1980     |
|    fps             | 141      |
|    time_elapsed    | 3356     |
|    total_timesteps | 476160   |
| train/             |          |
|    actor_loss      | -0.155   |
|    critic_loss     | 0.000151 |
|    ent_coef        | 0.000568 |
|    ent_coef_loss   | 0.393    |
|    learning_rate   | 0.0003   |
|    n_updates       | 59507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.563   |
| time/              |          |
|    episodes        | 1984     |
|    fps             | 141      |
|    time_elapsed    | 3356     |
|    total_timesteps | 476160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.57    |
| time/              |          |
|    episodes        | 1988     |
|    fps             | 141      |
|    time_elapsed    | 3379     |
|    total_timesteps | 478080   |
| train/             |          |
|    actor_loss      | -0.158   |
|    critic_loss     | 0.000189 |
|    ent_coef        | 0.000577 |
|    ent_coef_loss   | 0.86     |
|    learning_rate   | 0.0003   |
|    n_updates       | 59747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.57    |
| time/              |          |
|    episodes        | 1992     |
|    fps             | 141      |
|    time_elapsed    | 3380     |
|    total_timesteps | 478080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.569   |
| time/              |          |
|    episodes        | 1996     |
|    fps             | 141      |
|    time_elapsed    | 3397     |
|    total_timesteps | 480000   |
| train/             |          |
|    actor_loss      | -0.141   |
|    critic_loss     | 0.000177 |
|    ent_coef        | 0.000572 |
|    ent_coef_loss   | 0.326    |
|    learning_rate   | 0.0003   |
|    n_updates       | 59987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.569   |
| time/              |          |
|    episodes        | 2000     |
|    fps             | 141      |
|    time_elapsed    | 3397     |
|    total_timesteps | 480000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.562   |
| time/              |          |
|    episodes        | 2004     |
|    fps             | 141      |
|    time_elapsed    | 3416     |
|    total_timesteps | 481920   |
| train/             |          |
|    actor_loss      | -0.134   |
|    critic_loss     | 0.000174 |
|    ent_coef        | 0.000574 |
|    ent_coef_loss   | 0.434    |
|    learning_rate   | 0.0003   |
|    n_updates       | 60227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.562   |
| time/              |          |
|    episodes        | 2008     |
|    fps             | 141      |
|    time_elapsed    | 3416     |
|    total_timesteps | 481920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.595   |
| time/              |          |
|    episodes        | 2012     |
|    fps             | 140      |
|    time_elapsed    | 3437     |
|    total_timesteps | 483840   |
| train/             |          |
|    actor_loss      | -0.132   |
|    critic_loss     | 0.000193 |
|    ent_coef        | 0.00058  |
|    ent_coef_loss   | 0.521    |
|    learning_rate   | 0.0003   |
|    n_updates       | 60467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.595   |
| time/              |          |
|    episodes        | 2016     |
|    fps             | 140      |
|    time_elapsed    | 3437     |
|    total_timesteps | 483840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.646   |
| time/              |          |
|    episodes        | 2020     |
|    fps             | 140      |
|    time_elapsed    | 3460     |
|    total_timesteps | 485760   |
| train/             |          |
|    actor_loss      | -0.121   |
|    critic_loss     | 0.000144 |
|    ent_coef        | 0.000555 |
|    ent_coef_loss   | -1.03    |
|    learning_rate   | 0.0003   |
|    n_updates       | 60707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.646   |
| time/              |          |
|    episodes        | 2024     |
|    fps             | 140      |
|    time_elapsed    | 3460     |
|    total_timesteps | 485760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 2028     |
|    fps             | 139      |
|    time_elapsed    | 3489     |
|    total_timesteps | 487680   |
| train/             |          |
|    actor_loss      | -0.116   |
|    critic_loss     | 0.000169 |
|    ent_coef        | 0.000538 |
|    ent_coef_loss   | 2.2      |
|    learning_rate   | 0.0003   |
|    n_updates       | 60947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 2032     |
|    fps             | 139      |
|    time_elapsed    | 3489     |
|    total_timesteps | 487680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.711   |
| time/              |          |
|    episodes        | 2036     |
|    fps             | 139      |
|    time_elapsed    | 3507     |
|    total_timesteps | 489600   |
| train/             |          |
|    actor_loss      | -0.11    |
|    critic_loss     | 0.000201 |
|    ent_coef        | 0.000516 |
|    ent_coef_loss   | -0.758   |
|    learning_rate   | 0.0003   |
|    n_updates       | 61187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.711   |
| time/              |          |
|    episodes        | 2040     |
|    fps             | 139      |
|    time_elapsed    | 3507     |
|    total_timesteps | 489600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.8     |
| time/              |          |
|    episodes        | 2044     |
|    fps             | 139      |
|    time_elapsed    | 3526     |
|    total_timesteps | 491520   |
| train/             |          |
|    actor_loss      | -0.109   |
|    critic_loss     | 0.000175 |
|    ent_coef        | 0.000497 |
|    ent_coef_loss   | -0.924   |
|    learning_rate   | 0.0003   |
|    n_updates       | 61427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.8     |
| time/              |          |
|    episodes        | 2048     |
|    fps             | 139      |
|    time_elapsed    | 3526     |
|    total_timesteps | 491520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.851   |
| time/              |          |
|    episodes        | 2052     |
|    fps             | 139      |
|    time_elapsed    | 3548     |
|    total_timesteps | 493440   |
| train/             |          |
|    actor_loss      | -0.109   |
|    critic_loss     | 0.000113 |
|    ent_coef        | 0.000485 |
|    ent_coef_loss   | -0.102   |
|    learning_rate   | 0.0003   |
|    n_updates       | 61667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.851   |
| time/              |          |
|    episodes        | 2056     |
|    fps             | 139      |
|    time_elapsed    | 3548     |
|    total_timesteps | 493440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.898   |
| time/              |          |
|    episodes        | 2060     |
|    fps             | 138      |
|    time_elapsed    | 3572     |
|    total_timesteps | 495360   |
| train/             |          |
|    actor_loss      | -0.102   |
|    critic_loss     | 0.000148 |
|    ent_coef        | 0.000485 |
|    ent_coef_loss   | 0.205    |
|    learning_rate   | 0.0003   |
|    n_updates       | 61907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.898   |
| time/              |          |
|    episodes        | 2064     |
|    fps             | 138      |
|    time_elapsed    | 3572     |
|    total_timesteps | 495360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.918   |
| time/              |          |
|    episodes        | 2068     |
|    fps             | 138      |
|    time_elapsed    | 3596     |
|    total_timesteps | 497280   |
| train/             |          |
|    actor_loss      | -0.102   |
|    critic_loss     | 0.000136 |
|    ent_coef        | 0.000512 |
|    ent_coef_loss   | 0.161    |
|    learning_rate   | 0.0003   |
|    n_updates       | 62147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.918   |
| time/              |          |
|    episodes        | 2072     |
|    fps             | 138      |
|    time_elapsed    | 3596     |
|    total_timesteps | 497280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.926   |
| time/              |          |
|    episodes        | 2076     |
|    fps             | 138      |
|    time_elapsed    | 3615     |
|    total_timesteps | 499200   |
| train/             |          |
|    actor_loss      | -0.0938  |
|    critic_loss     | 0.000121 |
|    ent_coef        | 0.000543 |
|    ent_coef_loss   | 0.904    |
|    learning_rate   | 0.0003   |
|    n_updates       | 62387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.926   |
| time/              |          |
|    episodes        | 2080     |
|    fps             | 138      |
|    time_elapsed    | 3615     |
|    total_timesteps | 499200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.918   |
| time/              |          |
|    episodes        | 2084     |
|    fps             | 137      |
|    time_elapsed    | 3637     |
|    total_timesteps | 501120   |
| train/             |          |
|    actor_loss      | -0.0856  |
|    critic_loss     | 0.000129 |
|    ent_coef        | 0.000578 |
|    ent_coef_loss   | 0.256    |
|    learning_rate   | 0.0003   |
|    n_updates       | 62627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.918   |
| time/              |          |
|    episodes        | 2088     |
|    fps             | 137      |
|    time_elapsed    | 3637     |
|    total_timesteps | 501120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.932   |
| time/              |          |
|    episodes        | 2092     |
|    fps             | 137      |
|    time_elapsed    | 3658     |
|    total_timesteps | 503040   |
| train/             |          |
|    actor_loss      | -0.0816  |
|    critic_loss     | 0.000121 |
|    ent_coef        | 0.000607 |
|    ent_coef_loss   | 0.862    |
|    learning_rate   | 0.0003   |
|    n_updates       | 62867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.932   |
| time/              |          |
|    episodes        | 2096     |
|    fps             | 137      |
|    time_elapsed    | 3658     |
|    total_timesteps | 503040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.975   |
| time/              |          |
|    episodes        | 2100     |
|    fps             | 137      |
|    time_elapsed    | 3681     |
|    total_timesteps | 504960   |
| train/             |          |
|    actor_loss      | -0.0804  |
|    critic_loss     | 0.000141 |
|    ent_coef        | 0.000622 |
|    ent_coef_loss   | 0.183    |
|    learning_rate   | 0.0003   |
|    n_updates       | 63107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.975   |
| time/              |          |
|    episodes        | 2104     |
|    fps             | 137      |
|    time_elapsed    | 3681     |
|    total_timesteps | 504960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.971   |
| time/              |          |
|    episodes        | 2108     |
|    fps             | 137      |
|    time_elapsed    | 3699     |
|    total_timesteps | 506880   |
| train/             |          |
|    actor_loss      | -0.0682  |
|    critic_loss     | 0.000138 |
|    ent_coef        | 0.000632 |
|    ent_coef_loss   | 0.706    |
|    learning_rate   | 0.0003   |
|    n_updates       | 63347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.971   |
| time/              |          |
|    episodes        | 2112     |
|    fps             | 137      |
|    time_elapsed    | 3699     |
|    total_timesteps | 506880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.959   |
| time/              |          |
|    episodes        | 2116     |
|    fps             | 137      |
|    time_elapsed    | 3711     |
|    total_timesteps | 508800   |
| train/             |          |
|    actor_loss      | -0.0691  |
|    critic_loss     | 0.000231 |
|    ent_coef        | 0.000639 |
|    ent_coef_loss   | 0.877    |
|    learning_rate   | 0.0003   |
|    n_updates       | 63587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.959   |
| time/              |          |
|    episodes        | 2120     |
|    fps             | 137      |
|    time_elapsed    | 3711     |
|    total_timesteps | 508800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.873   |
| time/              |          |
|    episodes        | 2124     |
|    fps             | 137      |
|    time_elapsed    | 3725     |
|    total_timesteps | 510720   |
| train/             |          |
|    actor_loss      | -0.0608  |
|    critic_loss     | 0.000186 |
|    ent_coef        | 0.000642 |
|    ent_coef_loss   | -0.568   |
|    learning_rate   | 0.0003   |
|    n_updates       | 63827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.873   |
| time/              |          |
|    episodes        | 2128     |
|    fps             | 137      |
|    time_elapsed    | 3725     |
|    total_timesteps | 510720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.834   |
| time/              |          |
|    episodes        | 2132     |
|    fps             | 137      |
|    time_elapsed    | 3741     |
|    total_timesteps | 512640   |
| train/             |          |
|    actor_loss      | -0.0626  |
|    critic_loss     | 0.000131 |
|    ent_coef        | 0.000649 |
|    ent_coef_loss   | 0.304    |
|    learning_rate   | 0.0003   |
|    n_updates       | 64067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.834   |
| time/              |          |
|    episodes        | 2136     |
|    fps             | 137      |
|    time_elapsed    | 3741     |
|    total_timesteps | 512640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.843   |
| time/              |          |
|    episodes        | 2140     |
|    fps             | 136      |
|    time_elapsed    | 3763     |
|    total_timesteps | 514560   |
| train/             |          |
|    actor_loss      | -0.0527  |
|    critic_loss     | 0.000196 |
|    ent_coef        | 0.000647 |
|    ent_coef_loss   | -0.222   |
|    learning_rate   | 0.0003   |
|    n_updates       | 64307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.843   |
| time/              |          |
|    episodes        | 2144     |
|    fps             | 136      |
|    time_elapsed    | 3763     |
|    total_timesteps | 514560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.785   |
| time/              |          |
|    episodes        | 2148     |
|    fps             | 136      |
|    time_elapsed    | 3777     |
|    total_timesteps | 516480   |
| train/             |          |
|    actor_loss      | -0.0432  |
|    critic_loss     | 0.000163 |
|    ent_coef        | 0.000625 |
|    ent_coef_loss   | 0.217    |
|    learning_rate   | 0.0003   |
|    n_updates       | 64547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.785   |
| time/              |          |
|    episodes        | 2152     |
|    fps             | 136      |
|    time_elapsed    | 3777     |
|    total_timesteps | 516480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.789   |
| time/              |          |
|    episodes        | 2156     |
|    fps             | 136      |
|    time_elapsed    | 3797     |
|    total_timesteps | 518400   |
| train/             |          |
|    actor_loss      | -0.0463  |
|    critic_loss     | 0.000331 |
|    ent_coef        | 0.000626 |
|    ent_coef_loss   | 0.358    |
|    learning_rate   | 0.0003   |
|    n_updates       | 64787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.789   |
| time/              |          |
|    episodes        | 2160     |
|    fps             | 136      |
|    time_elapsed    | 3797     |
|    total_timesteps | 518400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.745   |
| time/              |          |
|    episodes        | 2164     |
|    fps             | 136      |
|    time_elapsed    | 3815     |
|    total_timesteps | 520320   |
| train/             |          |
|    actor_loss      | -0.0337  |
|    critic_loss     | 0.000151 |
|    ent_coef        | 0.000647 |
|    ent_coef_loss   | 0.661    |
|    learning_rate   | 0.0003   |
|    n_updates       | 65027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.745   |
| time/              |          |
|    episodes        | 2168     |
|    fps             | 136      |
|    time_elapsed    | 3815     |
|    total_timesteps | 520320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.733   |
| time/              |          |
|    episodes        | 2172     |
|    fps             | 136      |
|    time_elapsed    | 3837     |
|    total_timesteps | 522240   |
| train/             |          |
|    actor_loss      | -0.0311  |
|    critic_loss     | 0.000129 |
|    ent_coef        | 0.000671 |
|    ent_coef_loss   | 0.196    |
|    learning_rate   | 0.0003   |
|    n_updates       | 65267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.733   |
| time/              |          |
|    episodes        | 2176     |
|    fps             | 136      |
|    time_elapsed    | 3837     |
|    total_timesteps | 522240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.733   |
| time/              |          |
|    episodes        | 2180     |
|    fps             | 135      |
|    time_elapsed    | 3854     |
|    total_timesteps | 524160   |
| train/             |          |
|    actor_loss      | -0.0374  |
|    critic_loss     | 0.000166 |
|    ent_coef        | 0.000667 |
|    ent_coef_loss   | 0.199    |
|    learning_rate   | 0.0003   |
|    n_updates       | 65507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.733   |
| time/              |          |
|    episodes        | 2184     |
|    fps             | 135      |
|    time_elapsed    | 3854     |
|    total_timesteps | 524160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.743   |
| time/              |          |
|    episodes        | 2188     |
|    fps             | 135      |
|    time_elapsed    | 3872     |
|    total_timesteps | 526080   |
| train/             |          |
|    actor_loss      | -0.0327  |
|    critic_loss     | 0.000164 |
|    ent_coef        | 0.000646 |
|    ent_coef_loss   | 0.128    |
|    learning_rate   | 0.0003   |
|    n_updates       | 65747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.743   |
| time/              |          |
|    episodes        | 2192     |
|    fps             | 135      |
|    time_elapsed    | 3872     |
|    total_timesteps | 526080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.722   |
| time/              |          |
|    episodes        | 2196     |
|    fps             | 135      |
|    time_elapsed    | 3888     |
|    total_timesteps | 528000   |
| train/             |          |
|    actor_loss      | -0.0219  |
|    critic_loss     | 0.000146 |
|    ent_coef        | 0.000635 |
|    ent_coef_loss   | -0.172   |
|    learning_rate   | 0.0003   |
|    n_updates       | 65987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.722   |
| time/              |          |
|    episodes        | 2200     |
|    fps             | 135      |
|    time_elapsed    | 3888     |
|    total_timesteps | 528000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.681   |
| time/              |          |
|    episodes        | 2204     |
|    fps             | 135      |
|    time_elapsed    | 3907     |
|    total_timesteps | 529920   |
| train/             |          |
|    actor_loss      | -0.0114  |
|    critic_loss     | 0.000146 |
|    ent_coef        | 0.000621 |
|    ent_coef_loss   | -0.234   |
|    learning_rate   | 0.0003   |
|    n_updates       | 66227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.681   |
| time/              |          |
|    episodes        | 2208     |
|    fps             | 135      |
|    time_elapsed    | 3907     |
|    total_timesteps | 529920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.672   |
| time/              |          |
|    episodes        | 2212     |
|    fps             | 135      |
|    time_elapsed    | 3925     |
|    total_timesteps | 531840   |
| train/             |          |
|    actor_loss      | -0.0165  |
|    critic_loss     | 0.000155 |
|    ent_coef        | 0.000609 |
|    ent_coef_loss   | -0.0196  |
|    learning_rate   | 0.0003   |
|    n_updates       | 66467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.672   |
| time/              |          |
|    episodes        | 2216     |
|    fps             | 135      |
|    time_elapsed    | 3925     |
|    total_timesteps | 531840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.66    |
| time/              |          |
|    episodes        | 2220     |
|    fps             | 135      |
|    time_elapsed    | 3945     |
|    total_timesteps | 533760   |
| train/             |          |
|    actor_loss      | 0.00815  |
|    critic_loss     | 0.00018  |
|    ent_coef        | 0.000611 |
|    ent_coef_loss   | -0.205   |
|    learning_rate   | 0.0003   |
|    n_updates       | 66707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.66    |
| time/              |          |
|    episodes        | 2224     |
|    fps             | 135      |
|    time_elapsed    | 3945     |
|    total_timesteps | 533760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.664   |
| time/              |          |
|    episodes        | 2228     |
|    fps             | 135      |
|    time_elapsed    | 3963     |
|    total_timesteps | 535680   |
| train/             |          |
|    actor_loss      | -0.00459 |
|    critic_loss     | 0.000174 |
|    ent_coef        | 0.000603 |
|    ent_coef_loss   | 0.12     |
|    learning_rate   | 0.0003   |
|    n_updates       | 66947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.664   |
| time/              |          |
|    episodes        | 2232     |
|    fps             | 135      |
|    time_elapsed    | 3963     |
|    total_timesteps | 535680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 2236     |
|    fps             | 134      |
|    time_elapsed    | 3987     |
|    total_timesteps | 537600   |
| train/             |          |
|    actor_loss      | -0.0133  |
|    critic_loss     | 0.000135 |
|    ent_coef        | 0.000603 |
|    ent_coef_loss   | 0.209    |
|    learning_rate   | 0.0003   |
|    n_updates       | 67187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.673   |
| time/              |          |
|    episodes        | 2240     |
|    fps             | 134      |
|    time_elapsed    | 3987     |
|    total_timesteps | 537600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.636   |
| time/              |          |
|    episodes        | 2244     |
|    fps             | 134      |
|    time_elapsed    | 4005     |
|    total_timesteps | 539520   |
| train/             |          |
|    actor_loss      | 0.000328 |
|    critic_loss     | 0.000175 |
|    ent_coef        | 0.000601 |
|    ent_coef_loss   | -0.148   |
|    learning_rate   | 0.0003   |
|    n_updates       | 67427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.636   |
| time/              |          |
|    episodes        | 2248     |
|    fps             | 134      |
|    time_elapsed    | 4005     |
|    total_timesteps | 539520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.613   |
| time/              |          |
|    episodes        | 2252     |
|    fps             | 134      |
|    time_elapsed    | 4029     |
|    total_timesteps | 541440   |
| train/             |          |
|    actor_loss      | 0.00742  |
|    critic_loss     | 0.000136 |
|    ent_coef        | 0.000591 |
|    ent_coef_loss   | -0.822   |
|    learning_rate   | 0.0003   |
|    n_updates       | 67667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.613   |
| time/              |          |
|    episodes        | 2256     |
|    fps             | 134      |
|    time_elapsed    | 4029     |
|    total_timesteps | 541440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.613   |
| time/              |          |
|    episodes        | 2260     |
|    fps             | 134      |
|    time_elapsed    | 4049     |
|    total_timesteps | 543360   |
| train/             |          |
|    actor_loss      | 0.0173   |
|    critic_loss     | 0.000125 |
|    ent_coef        | 0.000551 |
|    ent_coef_loss   | 0.663    |
|    learning_rate   | 0.0003   |
|    n_updates       | 67907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.613   |
| time/              |          |
|    episodes        | 2264     |
|    fps             | 134      |
|    time_elapsed    | 4049     |
|    total_timesteps | 543360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.609   |
| time/              |          |
|    episodes        | 2268     |
|    fps             | 134      |
|    time_elapsed    | 4064     |
|    total_timesteps | 545280   |
| train/             |          |
|    actor_loss      | 0.00892  |
|    critic_loss     | 0.000143 |
|    ent_coef        | 0.000546 |
|    ent_coef_loss   | 0.468    |
|    learning_rate   | 0.0003   |
|    n_updates       | 68147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.609   |
| time/              |          |
|    episodes        | 2272     |
|    fps             | 134      |
|    time_elapsed    | 4064     |
|    total_timesteps | 545280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 2276     |
|    fps             | 134      |
|    time_elapsed    | 4075     |
|    total_timesteps | 547200   |
| train/             |          |
|    actor_loss      | 0.0196   |
|    critic_loss     | 0.000168 |
|    ent_coef        | 0.000553 |
|    ent_coef_loss   | -1.13    |
|    learning_rate   | 0.0003   |
|    n_updates       | 68387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 2280     |
|    fps             | 134      |
|    time_elapsed    | 4075     |
|    total_timesteps | 547200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.61    |
| time/              |          |
|    episodes        | 2284     |
|    fps             | 134      |
|    time_elapsed    | 4085     |
|    total_timesteps | 549120   |
| train/             |          |
|    actor_loss      | 0.0207   |
|    critic_loss     | 0.000166 |
|    ent_coef        | 0.000545 |
|    ent_coef_loss   | -0.143   |
|    learning_rate   | 0.0003   |
|    n_updates       | 68627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.61    |
| time/              |          |
|    episodes        | 2288     |
|    fps             | 134      |
|    time_elapsed    | 4085     |
|    total_timesteps | 549120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.62    |
| time/              |          |
|    episodes        | 2292     |
|    fps             | 134      |
|    time_elapsed    | 4100     |
|    total_timesteps | 551040   |
| train/             |          |
|    actor_loss      | 0.0258   |
|    critic_loss     | 0.000172 |
|    ent_coef        | 0.000539 |
|    ent_coef_loss   | -0.284   |
|    learning_rate   | 0.0003   |
|    n_updates       | 68867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.62    |
| time/              |          |
|    episodes        | 2296     |
|    fps             | 134      |
|    time_elapsed    | 4100     |
|    total_timesteps | 551040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.694   |
| time/              |          |
|    episodes        | 2300     |
|    fps             | 134      |
|    time_elapsed    | 4122     |
|    total_timesteps | 552960   |
| train/             |          |
|    actor_loss      | 0.0309   |
|    critic_loss     | 0.00392  |
|    ent_coef        | 0.000528 |
|    ent_coef_loss   | -0.215   |
|    learning_rate   | 0.0003   |
|    n_updates       | 69107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.694   |
| time/              |          |
|    episodes        | 2304     |
|    fps             | 134      |
|    time_elapsed    | 4122     |
|    total_timesteps | 552960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.68    |
| time/              |          |
|    episodes        | 2308     |
|    fps             | 133      |
|    time_elapsed    | 4141     |
|    total_timesteps | 554880   |
| train/             |          |
|    actor_loss      | 0.0387   |
|    critic_loss     | 0.000148 |
|    ent_coef        | 0.000495 |
|    ent_coef_loss   | -1.91    |
|    learning_rate   | 0.0003   |
|    n_updates       | 69347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.68    |
| time/              |          |
|    episodes        | 2312     |
|    fps             | 133      |
|    time_elapsed    | 4141     |
|    total_timesteps | 554880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.72    |
| time/              |          |
|    episodes        | 2316     |
|    fps             | 133      |
|    time_elapsed    | 4164     |
|    total_timesteps | 556800   |
| train/             |          |
|    actor_loss      | 0.0322   |
|    critic_loss     | 0.000165 |
|    ent_coef        | 0.000488 |
|    ent_coef_loss   | 0.517    |
|    learning_rate   | 0.0003   |
|    n_updates       | 69587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.72    |
| time/              |          |
|    episodes        | 2320     |
|    fps             | 133      |
|    time_elapsed    | 4164     |
|    total_timesteps | 556800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.73    |
| time/              |          |
|    episodes        | 2324     |
|    fps             | 133      |
|    time_elapsed    | 4181     |
|    total_timesteps | 558720   |
| train/             |          |
|    actor_loss      | 0.0421   |
|    critic_loss     | 0.000124 |
|    ent_coef        | 0.000522 |
|    ent_coef_loss   | 0.849    |
|    learning_rate   | 0.0003   |
|    n_updates       | 69827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.73    |
| time/              |          |
|    episodes        | 2328     |
|    fps             | 133      |
|    time_elapsed    | 4181     |
|    total_timesteps | 558720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.77    |
| time/              |          |
|    episodes        | 2332     |
|    fps             | 133      |
|    time_elapsed    | 4201     |
|    total_timesteps | 560640   |
| train/             |          |
|    actor_loss      | 0.0426   |
|    critic_loss     | 0.000152 |
|    ent_coef        | 0.00055  |
|    ent_coef_loss   | 1.01     |
|    learning_rate   | 0.0003   |
|    n_updates       | 70067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.77    |
| time/              |          |
|    episodes        | 2336     |
|    fps             | 133      |
|    time_elapsed    | 4201     |
|    total_timesteps | 560640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.76    |
| time/              |          |
|    episodes        | 2340     |
|    fps             | 133      |
|    time_elapsed    | 4218     |
|    total_timesteps | 562560   |
| train/             |          |
|    actor_loss      | 0.0538   |
|    critic_loss     | 0.000347 |
|    ent_coef        | 0.000597 |
|    ent_coef_loss   | 1.89     |
|    learning_rate   | 0.0003   |
|    n_updates       | 70307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.76    |
| time/              |          |
|    episodes        | 2344     |
|    fps             | 133      |
|    time_elapsed    | 4218     |
|    total_timesteps | 562560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.78    |
| time/              |          |
|    episodes        | 2348     |
|    fps             | 133      |
|    time_elapsed    | 4230     |
|    total_timesteps | 564480   |
| train/             |          |
|    actor_loss      | 0.05     |
|    critic_loss     | 0.000168 |
|    ent_coef        | 0.000655 |
|    ent_coef_loss   | 1.13     |
|    learning_rate   | 0.0003   |
|    n_updates       | 70547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.78    |
| time/              |          |
|    episodes        | 2352     |
|    fps             | 133      |
|    time_elapsed    | 4230     |
|    total_timesteps | 564480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.8     |
| time/              |          |
|    episodes        | 2356     |
|    fps             | 133      |
|    time_elapsed    | 4250     |
|    total_timesteps | 566400   |
| train/             |          |
|    actor_loss      | 0.0609   |
|    critic_loss     | 0.000245 |
|    ent_coef        | 0.000725 |
|    ent_coef_loss   | 1.39     |
|    learning_rate   | 0.0003   |
|    n_updates       | 70787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.8     |
| time/              |          |
|    episodes        | 2360     |
|    fps             | 133      |
|    time_elapsed    | 4250     |
|    total_timesteps | 566400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.79    |
| time/              |          |
|    episodes        | 2364     |
|    fps             | 132      |
|    time_elapsed    | 4281     |
|    total_timesteps | 568320   |
| train/             |          |
|    actor_loss      | 0.104    |
|    critic_loss     | 0.000259 |
|    ent_coef        | 0.000801 |
|    ent_coef_loss   | 2.39     |
|    learning_rate   | 0.0003   |
|    n_updates       | 71027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.79    |
| time/              |          |
|    episodes        | 2368     |
|    fps             | 132      |
|    time_elapsed    | 4281     |
|    total_timesteps | 568320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.79    |
| time/              |          |
|    episodes        | 2372     |
|    fps             | 132      |
|    time_elapsed    | 4306     |
|    total_timesteps | 570240   |
| train/             |          |
|    actor_loss      | 0.147    |
|    critic_loss     | 0.0136   |
|    ent_coef        | 0.000877 |
|    ent_coef_loss   | 1.49     |
|    learning_rate   | 0.0003   |
|    n_updates       | 71267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.79    |
| time/              |          |
|    episodes        | 2376     |
|    fps             | 132      |
|    time_elapsed    | 4306     |
|    total_timesteps | 570240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.81    |
| time/              |          |
|    episodes        | 2380     |
|    fps             | 132      |
|    time_elapsed    | 4331     |
|    total_timesteps | 572160   |
| train/             |          |
|    actor_loss      | 0.152    |
|    critic_loss     | 0.00725  |
|    ent_coef        | 0.000944 |
|    ent_coef_loss   | 1.8      |
|    learning_rate   | 0.0003   |
|    n_updates       | 71507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.81    |
| time/              |          |
|    episodes        | 2384     |
|    fps             | 132      |
|    time_elapsed    | 4331     |
|    total_timesteps | 572160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.8     |
| time/              |          |
|    episodes        | 2388     |
|    fps             | 131      |
|    time_elapsed    | 4356     |
|    total_timesteps | 574080   |
| train/             |          |
|    actor_loss      | 0.0663   |
|    critic_loss     | 0.000394 |
|    ent_coef        | 0.00102  |
|    ent_coef_loss   | 1.48     |
|    learning_rate   | 0.0003   |
|    n_updates       | 71747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.8     |
| time/              |          |
|    episodes        | 2392     |
|    fps             | 131      |
|    time_elapsed    | 4356     |
|    total_timesteps | 574080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.75    |
| time/              |          |
|    episodes        | 2396     |
|    fps             | 131      |
|    time_elapsed    | 4379     |
|    total_timesteps | 576000   |
| train/             |          |
|    actor_loss      | 0.14     |
|    critic_loss     | 0.00255  |
|    ent_coef        | 0.00107  |
|    ent_coef_loss   | 0.872    |
|    learning_rate   | 0.0003   |
|    n_updates       | 71987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -4.75    |
| time/              |          |
|    episodes        | 2400     |
|    fps             | 131      |
|    time_elapsed    | 4379     |
|    total_timesteps | 576000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -2.71    |
| time/              |          |
|    episodes        | 2404     |
|    fps             | 131      |
|    time_elapsed    | 4405     |
|    total_timesteps | 577920   |
| train/             |          |
|    actor_loss      | 0.0772   |
|    critic_loss     | 0.000192 |
|    ent_coef        | 0.00114  |
|    ent_coef_loss   | 0.608    |
|    learning_rate   | 0.0003   |
|    n_updates       | 72227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -2.71    |
| time/              |          |
|    episodes        | 2408     |
|    fps             | 131      |
|    time_elapsed    | 4405     |
|    total_timesteps | 577920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.753   |
| time/              |          |
|    episodes        | 2412     |
|    fps             | 130      |
|    time_elapsed    | 4430     |
|    total_timesteps | 579840   |
| train/             |          |
|    actor_loss      | 0.0817   |
|    critic_loss     | 0.000246 |
|    ent_coef        | 0.0012   |
|    ent_coef_loss   | 0.572    |
|    learning_rate   | 0.0003   |
|    n_updates       | 72467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.753   |
| time/              |          |
|    episodes        | 2416     |
|    fps             | 130      |
|    time_elapsed    | 4430     |
|    total_timesteps | 579840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.741   |
| time/              |          |
|    episodes        | 2420     |
|    fps             | 130      |
|    time_elapsed    | 4453     |
|    total_timesteps | 581760   |
| train/             |          |
|    actor_loss      | 0.082    |
|    critic_loss     | 0.000152 |
|    ent_coef        | 0.00128  |
|    ent_coef_loss   | 1.55     |
|    learning_rate   | 0.0003   |
|    n_updates       | 72707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.741   |
| time/              |          |
|    episodes        | 2424     |
|    fps             | 130      |
|    time_elapsed    | 4453     |
|    total_timesteps | 581760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.728   |
| time/              |          |
|    episodes        | 2428     |
|    fps             | 130      |
|    time_elapsed    | 4478     |
|    total_timesteps | 583680   |
| train/             |          |
|    actor_loss      | 0.14     |
|    critic_loss     | 0.00169  |
|    ent_coef        | 0.00138  |
|    ent_coef_loss   | 2.46     |
|    learning_rate   | 0.0003   |
|    n_updates       | 72947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.728   |
| time/              |          |
|    episodes        | 2432     |
|    fps             | 130      |
|    time_elapsed    | 4478     |
|    total_timesteps | 583680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.692   |
| time/              |          |
|    episodes        | 2436     |
|    fps             | 130      |
|    time_elapsed    | 4501     |
|    total_timesteps | 585600   |
| train/             |          |
|    actor_loss      | 0.113    |
|    critic_loss     | 0.000648 |
|    ent_coef        | 0.0015   |
|    ent_coef_loss   | 1.42     |
|    learning_rate   | 0.0003   |
|    n_updates       | 73187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.692   |
| time/              |          |
|    episodes        | 2440     |
|    fps             | 130      |
|    time_elapsed    | 4501     |
|    total_timesteps | 585600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.687   |
| time/              |          |
|    episodes        | 2444     |
|    fps             | 129      |
|    time_elapsed    | 4526     |
|    total_timesteps | 587520   |
| train/             |          |
|    actor_loss      | 0.229    |
|    critic_loss     | 0.0044   |
|    ent_coef        | 0.00159  |
|    ent_coef_loss   | 1.11     |
|    learning_rate   | 0.0003   |
|    n_updates       | 73427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.687   |
| time/              |          |
|    episodes        | 2448     |
|    fps             | 129      |
|    time_elapsed    | 4526     |
|    total_timesteps | 587520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.682   |
| time/              |          |
|    episodes        | 2452     |
|    fps             | 129      |
|    time_elapsed    | 4550     |
|    total_timesteps | 589440   |
| train/             |          |
|    actor_loss      | 0.0977   |
|    critic_loss     | 0.000137 |
|    ent_coef        | 0.00161  |
|    ent_coef_loss   | -1.39    |
|    learning_rate   | 0.0003   |
|    n_updates       | 73667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.682   |
| time/              |          |
|    episodes        | 2456     |
|    fps             | 129      |
|    time_elapsed    | 4550     |
|    total_timesteps | 589440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 2460     |
|    fps             | 129      |
|    time_elapsed    | 4574     |
|    total_timesteps | 591360   |
| train/             |          |
|    actor_loss      | 0.109    |
|    critic_loss     | 0.000233 |
|    ent_coef        | 0.00163  |
|    ent_coef_loss   | 0.837    |
|    learning_rate   | 0.0003   |
|    n_updates       | 73907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 2464     |
|    fps             | 129      |
|    time_elapsed    | 4574     |
|    total_timesteps | 591360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.676   |
| time/              |          |
|    episodes        | 2468     |
|    fps             | 128      |
|    time_elapsed    | 4599     |
|    total_timesteps | 593280   |
| train/             |          |
|    actor_loss      | 0.131    |
|    critic_loss     | 0.00168  |
|    ent_coef        | 0.00168  |
|    ent_coef_loss   | 0.272    |
|    learning_rate   | 0.0003   |
|    n_updates       | 74147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.676   |
| time/              |          |
|    episodes        | 2472     |
|    fps             | 128      |
|    time_elapsed    | 4599     |
|    total_timesteps | 593280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.662   |
| time/              |          |
|    episodes        | 2476     |
|    fps             | 128      |
|    time_elapsed    | 4614     |
|    total_timesteps | 595200   |
| train/             |          |
|    actor_loss      | 0.124    |
|    critic_loss     | 0.000224 |
|    ent_coef        | 0.00171  |
|    ent_coef_loss   | -0.603   |
|    learning_rate   | 0.0003   |
|    n_updates       | 74387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.662   |
| time/              |          |
|    episodes        | 2480     |
|    fps             | 128      |
|    time_elapsed    | 4614     |
|    total_timesteps | 595200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 2484     |
|    fps             | 128      |
|    time_elapsed    | 4629     |
|    total_timesteps | 597120   |
| train/             |          |
|    actor_loss      | 0.212    |
|    critic_loss     | 0.00126  |
|    ent_coef        | 0.00169  |
|    ent_coef_loss   | 0.113    |
|    learning_rate   | 0.0003   |
|    n_updates       | 74627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 2488     |
|    fps             | 128      |
|    time_elapsed    | 4629     |
|    total_timesteps | 597120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.686   |
| time/              |          |
|    episodes        | 2492     |
|    fps             | 128      |
|    time_elapsed    | 4645     |
|    total_timesteps | 599040   |
| train/             |          |
|    actor_loss      | 0.279    |
|    critic_loss     | 0.0017   |
|    ent_coef        | 0.00165  |
|    ent_coef_loss   | 0.456    |
|    learning_rate   | 0.0003   |
|    n_updates       | 74867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.686   |
| time/              |          |
|    episodes        | 2496     |
|    fps             | 128      |
|    time_elapsed    | 4645     |
|    total_timesteps | 599040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.66    |
| time/              |          |
|    episodes        | 2500     |
|    fps             | 128      |
|    time_elapsed    | 4659     |
|    total_timesteps | 600960   |
| train/             |          |
|    actor_loss      | 0.211    |
|    critic_loss     | 0.00735  |
|    ent_coef        | 0.00157  |
|    ent_coef_loss   | 0.577    |
|    learning_rate   | 0.0003   |
|    n_updates       | 75107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.66    |
| time/              |          |
|    episodes        | 2504     |
|    fps             | 128      |
|    time_elapsed    | 4659     |
|    total_timesteps | 600960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.655   |
| time/              |          |
|    episodes        | 2508     |
|    fps             | 128      |
|    time_elapsed    | 4673     |
|    total_timesteps | 602880   |
| train/             |          |
|    actor_loss      | 0.216    |
|    critic_loss     | 0.00286  |
|    ent_coef        | 0.00159  |
|    ent_coef_loss   | 3.48     |
|    learning_rate   | 0.0003   |
|    n_updates       | 75347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.655   |
| time/              |          |
|    episodes        | 2512     |
|    fps             | 128      |
|    time_elapsed    | 4673     |
|    total_timesteps | 602880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.672   |
| time/              |          |
|    episodes        | 2516     |
|    fps             | 128      |
|    time_elapsed    | 4697     |
|    total_timesteps | 604800   |
| train/             |          |
|    actor_loss      | 0.122    |
|    critic_loss     | 0.000141 |
|    ent_coef        | 0.00167  |
|    ent_coef_loss   | -0.339   |
|    learning_rate   | 0.0003   |
|    n_updates       | 75587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.672   |
| time/              |          |
|    episodes        | 2520     |
|    fps             | 128      |
|    time_elapsed    | 4697     |
|    total_timesteps | 604800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.684   |
| time/              |          |
|    episodes        | 2524     |
|    fps             | 128      |
|    time_elapsed    | 4733     |
|    total_timesteps | 606720   |
| train/             |          |
|    actor_loss      | 0.497    |
|    critic_loss     | 0.009    |
|    ent_coef        | 0.00166  |
|    ent_coef_loss   | 0.868    |
|    learning_rate   | 0.0003   |
|    n_updates       | 75827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.684   |
| time/              |          |
|    episodes        | 2528     |
|    fps             | 128      |
|    time_elapsed    | 4733     |
|    total_timesteps | 606720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 2532     |
|    fps             | 127      |
|    time_elapsed    | 4757     |
|    total_timesteps | 608640   |
| train/             |          |
|    actor_loss      | 0.141    |
|    critic_loss     | 0.000193 |
|    ent_coef        | 0.00162  |
|    ent_coef_loss   | -0.902   |
|    learning_rate   | 0.0003   |
|    n_updates       | 76067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 2536     |
|    fps             | 127      |
|    time_elapsed    | 4757     |
|    total_timesteps | 608640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.683   |
| time/              |          |
|    episodes        | 2540     |
|    fps             | 127      |
|    time_elapsed    | 4776     |
|    total_timesteps | 610560   |
| train/             |          |
|    actor_loss      | 0.194    |
|    critic_loss     | 0.00354  |
|    ent_coef        | 0.00159  |
|    ent_coef_loss   | 0.6      |
|    learning_rate   | 0.0003   |
|    n_updates       | 76307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.683   |
| time/              |          |
|    episodes        | 2544     |
|    fps             | 127      |
|    time_elapsed    | 4776     |
|    total_timesteps | 610560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 2548     |
|    fps             | 127      |
|    time_elapsed    | 4787     |
|    total_timesteps | 612480   |
| train/             |          |
|    actor_loss      | 0.227    |
|    critic_loss     | 0.00134  |
|    ent_coef        | 0.00159  |
|    ent_coef_loss   | 2.27     |
|    learning_rate   | 0.0003   |
|    n_updates       | 76547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.688   |
| time/              |          |
|    episodes        | 2552     |
|    fps             | 127      |
|    time_elapsed    | 4787     |
|    total_timesteps | 612480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.681   |
| time/              |          |
|    episodes        | 2556     |
|    fps             | 128      |
|    time_elapsed    | 4799     |
|    total_timesteps | 614400   |
| train/             |          |
|    actor_loss      | 0.155    |
|    critic_loss     | 0.000153 |
|    ent_coef        | 0.00158  |
|    ent_coef_loss   | -0.485   |
|    learning_rate   | 0.0003   |
|    n_updates       | 76787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.681   |
| time/              |          |
|    episodes        | 2560     |
|    fps             | 128      |
|    time_elapsed    | 4799     |
|    total_timesteps | 614400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.678   |
| time/              |          |
|    episodes        | 2564     |
|    fps             | 127      |
|    time_elapsed    | 4820     |
|    total_timesteps | 616320   |
| train/             |          |
|    actor_loss      | 0.189    |
|    critic_loss     | 0.00199  |
|    ent_coef        | 0.00152  |
|    ent_coef_loss   | -1       |
|    learning_rate   | 0.0003   |
|    n_updates       | 77027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.678   |
| time/              |          |
|    episodes        | 2568     |
|    fps             | 127      |
|    time_elapsed    | 4820     |
|    total_timesteps | 616320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.679   |
| time/              |          |
|    episodes        | 2572     |
|    fps             | 127      |
|    time_elapsed    | 4833     |
|    total_timesteps | 618240   |
| train/             |          |
|    actor_loss      | 0.157    |
|    critic_loss     | 0.000565 |
|    ent_coef        | 0.00143  |
|    ent_coef_loss   | -2       |
|    learning_rate   | 0.0003   |
|    n_updates       | 77267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.679   |
| time/              |          |
|    episodes        | 2576     |
|    fps             | 127      |
|    time_elapsed    | 4833     |
|    total_timesteps | 618240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.665   |
| time/              |          |
|    episodes        | 2580     |
|    fps             | 128      |
|    time_elapsed    | 4844     |
|    total_timesteps | 620160   |
| train/             |          |
|    actor_loss      | 0.156    |
|    critic_loss     | 0.000151 |
|    ent_coef        | 0.00137  |
|    ent_coef_loss   | -1.37    |
|    learning_rate   | 0.0003   |
|    n_updates       | 77507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.665   |
| time/              |          |
|    episodes        | 2584     |
|    fps             | 128      |
|    time_elapsed    | 4844     |
|    total_timesteps | 620160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.65    |
| time/              |          |
|    episodes        | 2588     |
|    fps             | 128      |
|    time_elapsed    | 4853     |
|    total_timesteps | 622080   |
| train/             |          |
|    actor_loss      | 0.193    |
|    critic_loss     | 0.000518 |
|    ent_coef        | 0.0013   |
|    ent_coef_loss   | -0.071   |
|    learning_rate   | 0.0003   |
|    n_updates       | 77747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.65    |
| time/              |          |
|    episodes        | 2592     |
|    fps             | 128      |
|    time_elapsed    | 4853     |
|    total_timesteps | 622080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.643   |
| time/              |          |
|    episodes        | 2596     |
|    fps             | 128      |
|    time_elapsed    | 4863     |
|    total_timesteps | 624000   |
| train/             |          |
|    actor_loss      | 0.161    |
|    critic_loss     | 0.000127 |
|    ent_coef        | 0.0013   |
|    ent_coef_loss   | -0.588   |
|    learning_rate   | 0.0003   |
|    n_updates       | 77987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.643   |
| time/              |          |
|    episodes        | 2600     |
|    fps             | 128      |
|    time_elapsed    | 4863     |
|    total_timesteps | 624000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.639   |
| time/              |          |
|    episodes        | 2604     |
|    fps             | 128      |
|    time_elapsed    | 4873     |
|    total_timesteps | 625920   |
| train/             |          |
|    actor_loss      | 0.185    |
|    critic_loss     | 0.000252 |
|    ent_coef        | 0.00123  |
|    ent_coef_loss   | -0.801   |
|    learning_rate   | 0.0003   |
|    n_updates       | 78227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.639   |
| time/              |          |
|    episodes        | 2608     |
|    fps             | 128      |
|    time_elapsed    | 4873     |
|    total_timesteps | 625920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 2612     |
|    fps             | 128      |
|    time_elapsed    | 4882     |
|    total_timesteps | 627840   |
| train/             |          |
|    actor_loss      | 0.178    |
|    critic_loss     | 0.000195 |
|    ent_coef        | 0.00124  |
|    ent_coef_loss   | -0.817   |
|    learning_rate   | 0.0003   |
|    n_updates       | 78467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.67    |
| time/              |          |
|    episodes        | 2616     |
|    fps             | 128      |
|    time_elapsed    | 4882     |
|    total_timesteps | 627840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.657   |
| time/              |          |
|    episodes        | 2620     |
|    fps             | 128      |
|    time_elapsed    | 4892     |
|    total_timesteps | 629760   |
| train/             |          |
|    actor_loss      | 0.299    |
|    critic_loss     | 0.00409  |
|    ent_coef        | 0.00121  |
|    ent_coef_loss   | -0.767   |
|    learning_rate   | 0.0003   |
|    n_updates       | 78707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.657   |
| time/              |          |
|    episodes        | 2624     |
|    fps             | 128      |
|    time_elapsed    | 4892     |
|    total_timesteps | 629760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.658   |
| time/              |          |
|    episodes        | 2628     |
|    fps             | 128      |
|    time_elapsed    | 4902     |
|    total_timesteps | 631680   |
| train/             |          |
|    actor_loss      | 0.178    |
|    critic_loss     | 0.000184 |
|    ent_coef        | 0.00116  |
|    ent_coef_loss   | -2.12    |
|    learning_rate   | 0.0003   |
|    n_updates       | 78947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.658   |
| time/              |          |
|    episodes        | 2632     |
|    fps             | 128      |
|    time_elapsed    | 4902     |
|    total_timesteps | 631680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.645   |
| time/              |          |
|    episodes        | 2636     |
|    fps             | 128      |
|    time_elapsed    | 4911     |
|    total_timesteps | 633600   |
| train/             |          |
|    actor_loss      | 0.175    |
|    critic_loss     | 0.000115 |
|    ent_coef        | 0.00112  |
|    ent_coef_loss   | -0.991   |
|    learning_rate   | 0.0003   |
|    n_updates       | 79187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.645   |
| time/              |          |
|    episodes        | 2640     |
|    fps             | 128      |
|    time_elapsed    | 4911     |
|    total_timesteps | 633600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.628   |
| time/              |          |
|    episodes        | 2644     |
|    fps             | 129      |
|    time_elapsed    | 4921     |
|    total_timesteps | 635520   |
| train/             |          |
|    actor_loss      | 0.18     |
|    critic_loss     | 0.000144 |
|    ent_coef        | 0.00112  |
|    ent_coef_loss   | -0.0348  |
|    learning_rate   | 0.0003   |
|    n_updates       | 79427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.628   |
| time/              |          |
|    episodes        | 2648     |
|    fps             | 129      |
|    time_elapsed    | 4921     |
|    total_timesteps | 635520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.628   |
| time/              |          |
|    episodes        | 2652     |
|    fps             | 128      |
|    time_elapsed    | 4967     |
|    total_timesteps | 637440   |
| train/             |          |
|    actor_loss      | 0.191    |
|    critic_loss     | 0.00167  |
|    ent_coef        | 0.00109  |
|    ent_coef_loss   | -0.731   |
|    learning_rate   | 0.0003   |
|    n_updates       | 79667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.628   |
| time/              |          |
|    episodes        | 2656     |
|    fps             | 128      |
|    time_elapsed    | 4967     |
|    total_timesteps | 637440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.644   |
| time/              |          |
|    episodes        | 2660     |
|    fps             | 128      |
|    time_elapsed    | 4980     |
|    total_timesteps | 639360   |
| train/             |          |
|    actor_loss      | 0.229    |
|    critic_loss     | 0.00158  |
|    ent_coef        | 0.00104  |
|    ent_coef_loss   | -1.11    |
|    learning_rate   | 0.0003   |
|    n_updates       | 79907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.644   |
| time/              |          |
|    episodes        | 2664     |
|    fps             | 128      |
|    time_elapsed    | 4980     |
|    total_timesteps | 639360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.642   |
| time/              |          |
|    episodes        | 2668     |
|    fps             | 128      |
|    time_elapsed    | 4994     |
|    total_timesteps | 641280   |
| train/             |          |
|    actor_loss      | 0.312    |
|    critic_loss     | 0.00113  |
|    ent_coef        | 0.00106  |
|    ent_coef_loss   | 0.999    |
|    learning_rate   | 0.0003   |
|    n_updates       | 80147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.642   |
| time/              |          |
|    episodes        | 2672     |
|    fps             | 128      |
|    time_elapsed    | 4994     |
|    total_timesteps | 641280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.628   |
| time/              |          |
|    episodes        | 2676     |
|    fps             | 128      |
|    time_elapsed    | 5007     |
|    total_timesteps | 643200   |
| train/             |          |
|    actor_loss      | 0.234    |
|    critic_loss     | 0.00935  |
|    ent_coef        | 0.00106  |
|    ent_coef_loss   | 0.531    |
|    learning_rate   | 0.0003   |
|    n_updates       | 80387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.628   |
| time/              |          |
|    episodes        | 2680     |
|    fps             | 128      |
|    time_elapsed    | 5007     |
|    total_timesteps | 643200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.641   |
| time/              |          |
|    episodes        | 2684     |
|    fps             | 128      |
|    time_elapsed    | 5032     |
|    total_timesteps | 645120   |
| train/             |          |
|    actor_loss      | 0.232    |
|    critic_loss     | 0.00371  |
|    ent_coef        | 0.00104  |
|    ent_coef_loss   | 1.19     |
|    learning_rate   | 0.0003   |
|    n_updates       | 80627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.641   |
| time/              |          |
|    episodes        | 2688     |
|    fps             | 128      |
|    time_elapsed    | 5032     |
|    total_timesteps | 645120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 2692     |
|    fps             | 127      |
|    time_elapsed    | 5065     |
|    total_timesteps | 647040   |
| train/             |          |
|    actor_loss      | 0.201    |
|    critic_loss     | 0.000128 |
|    ent_coef        | 0.00103  |
|    ent_coef_loss   | -1.18    |
|    learning_rate   | 0.0003   |
|    n_updates       | 80867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 2696     |
|    fps             | 127      |
|    time_elapsed    | 5065     |
|    total_timesteps | 647040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.641   |
| time/              |          |
|    episodes        | 2700     |
|    fps             | 127      |
|    time_elapsed    | 5079     |
|    total_timesteps | 648960   |
| train/             |          |
|    actor_loss      | 0.201    |
|    critic_loss     | 0.000139 |
|    ent_coef        | 0.00103  |
|    ent_coef_loss   | -0.33    |
|    learning_rate   | 0.0003   |
|    n_updates       | 81107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.641   |
| time/              |          |
|    episodes        | 2704     |
|    fps             | 127      |
|    time_elapsed    | 5079     |
|    total_timesteps | 648960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.627   |
| time/              |          |
|    episodes        | 2708     |
|    fps             | 127      |
|    time_elapsed    | 5093     |
|    total_timesteps | 650880   |
| train/             |          |
|    actor_loss      | 0.3      |
|    critic_loss     | 0.000214 |
|    ent_coef        | 0.00103  |
|    ent_coef_loss   | 0.0361   |
|    learning_rate   | 0.0003   |
|    n_updates       | 81347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.627   |
| time/              |          |
|    episodes        | 2712     |
|    fps             | 127      |
|    time_elapsed    | 5093     |
|    total_timesteps | 650880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.598   |
| time/              |          |
|    episodes        | 2716     |
|    fps             | 127      |
|    time_elapsed    | 5108     |
|    total_timesteps | 652800   |
| train/             |          |
|    actor_loss      | 0.21     |
|    critic_loss     | 0.00328  |
|    ent_coef        | 0.00102  |
|    ent_coef_loss   | -0.714   |
|    learning_rate   | 0.0003   |
|    n_updates       | 81587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.598   |
| time/              |          |
|    episodes        | 2720     |
|    fps             | 127      |
|    time_elapsed    | 5108     |
|    total_timesteps | 652800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.587   |
| time/              |          |
|    episodes        | 2724     |
|    fps             | 127      |
|    time_elapsed    | 5130     |
|    total_timesteps | 654720   |
| train/             |          |
|    actor_loss      | 0.231    |
|    critic_loss     | 0.00179  |
|    ent_coef        | 0.00102  |
|    ent_coef_loss   | 2.39     |
|    learning_rate   | 0.0003   |
|    n_updates       | 81827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.587   |
| time/              |          |
|    episodes        | 2728     |
|    fps             | 127      |
|    time_elapsed    | 5130     |
|    total_timesteps | 654720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.599   |
| time/              |          |
|    episodes        | 2732     |
|    fps             | 127      |
|    time_elapsed    | 5149     |
|    total_timesteps | 656640   |
| train/             |          |
|    actor_loss      | 0.236    |
|    critic_loss     | 0.00133  |
|    ent_coef        | 0.00104  |
|    ent_coef_loss   | 0.0109   |
|    learning_rate   | 0.0003   |
|    n_updates       | 82067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.599   |
| time/              |          |
|    episodes        | 2736     |
|    fps             | 127      |
|    time_elapsed    | 5149     |
|    total_timesteps | 656640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.607   |
| time/              |          |
|    episodes        | 2740     |
|    fps             | 127      |
|    time_elapsed    | 5169     |
|    total_timesteps | 658560   |
| train/             |          |
|    actor_loss      | 0.217    |
|    critic_loss     | 0.00074  |
|    ent_coef        | 0.000992 |
|    ent_coef_loss   | -1.58    |
|    learning_rate   | 0.0003   |
|    n_updates       | 82307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.607   |
| time/              |          |
|    episodes        | 2744     |
|    fps             | 127      |
|    time_elapsed    | 5169     |
|    total_timesteps | 658560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.633   |
| time/              |          |
|    episodes        | 2748     |
|    fps             | 127      |
|    time_elapsed    | 5187     |
|    total_timesteps | 660480   |
| train/             |          |
|    actor_loss      | 0.284    |
|    critic_loss     | 0.000285 |
|    ent_coef        | 0.000928 |
|    ent_coef_loss   | 0.592    |
|    learning_rate   | 0.0003   |
|    n_updates       | 82547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.633   |
| time/              |          |
|    episodes        | 2752     |
|    fps             | 127      |
|    time_elapsed    | 5187     |
|    total_timesteps | 660480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.626   |
| time/              |          |
|    episodes        | 2756     |
|    fps             | 127      |
|    time_elapsed    | 5200     |
|    total_timesteps | 662400   |
| train/             |          |
|    actor_loss      | 0.258    |
|    critic_loss     | 0.00257  |
|    ent_coef        | 0.000908 |
|    ent_coef_loss   | -0.152   |
|    learning_rate   | 0.0003   |
|    n_updates       | 82787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.626   |
| time/              |          |
|    episodes        | 2760     |
|    fps             | 127      |
|    time_elapsed    | 5200     |
|    total_timesteps | 662400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.635   |
| time/              |          |
|    episodes        | 2764     |
|    fps             | 127      |
|    time_elapsed    | 5214     |
|    total_timesteps | 664320   |
| train/             |          |
|    actor_loss      | 0.232    |
|    critic_loss     | 0.000181 |
|    ent_coef        | 0.000902 |
|    ent_coef_loss   | -3.04    |
|    learning_rate   | 0.0003   |
|    n_updates       | 83027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.635   |
| time/              |          |
|    episodes        | 2768     |
|    fps             | 127      |
|    time_elapsed    | 5214     |
|    total_timesteps | 664320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.65    |
| time/              |          |
|    episodes        | 2772     |
|    fps             | 127      |
|    time_elapsed    | 5230     |
|    total_timesteps | 666240   |
| train/             |          |
|    actor_loss      | 0.269    |
|    critic_loss     | 0.000213 |
|    ent_coef        | 0.000858 |
|    ent_coef_loss   | -0.235   |
|    learning_rate   | 0.0003   |
|    n_updates       | 83267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.65    |
| time/              |          |
|    episodes        | 2776     |
|    fps             | 127      |
|    time_elapsed    | 5230     |
|    total_timesteps | 666240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.645   |
| time/              |          |
|    episodes        | 2780     |
|    fps             | 127      |
|    time_elapsed    | 5256     |
|    total_timesteps | 668160   |
| train/             |          |
|    actor_loss      | 0.233    |
|    critic_loss     | 0.00418  |
|    ent_coef        | 0.000861 |
|    ent_coef_loss   | 1.25     |
|    learning_rate   | 0.0003   |
|    n_updates       | 83507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.645   |
| time/              |          |
|    episodes        | 2784     |
|    fps             | 127      |
|    time_elapsed    | 5256     |
|    total_timesteps | 668160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 2788     |
|    fps             | 127      |
|    time_elapsed    | 5274     |
|    total_timesteps | 670080   |
| train/             |          |
|    actor_loss      | 0.227    |
|    critic_loss     | 0.000194 |
|    ent_coef        | 0.00092  |
|    ent_coef_loss   | -0.5     |
|    learning_rate   | 0.0003   |
|    n_updates       | 83747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 2792     |
|    fps             | 127      |
|    time_elapsed    | 5274     |
|    total_timesteps | 670080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.655   |
| time/              |          |
|    episodes        | 2796     |
|    fps             | 126      |
|    time_elapsed    | 5293     |
|    total_timesteps | 672000   |
| train/             |          |
|    actor_loss      | 0.226    |
|    critic_loss     | 0.000148 |
|    ent_coef        | 0.000961 |
|    ent_coef_loss   | -0.861   |
|    learning_rate   | 0.0003   |
|    n_updates       | 83987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.655   |
| time/              |          |
|    episodes        | 2800     |
|    fps             | 126      |
|    time_elapsed    | 5293     |
|    total_timesteps | 672000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 2804     |
|    fps             | 126      |
|    time_elapsed    | 5313     |
|    total_timesteps | 673920   |
| train/             |          |
|    actor_loss      | 0.402    |
|    critic_loss     | 0.00269  |
|    ent_coef        | 0.000972 |
|    ent_coef_loss   | 0.804    |
|    learning_rate   | 0.0003   |
|    n_updates       | 84227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 2808     |
|    fps             | 126      |
|    time_elapsed    | 5313     |
|    total_timesteps | 673920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.655   |
| time/              |          |
|    episodes        | 2812     |
|    fps             | 126      |
|    time_elapsed    | 5332     |
|    total_timesteps | 675840   |
| train/             |          |
|    actor_loss      | 0.241    |
|    critic_loss     | 0.000145 |
|    ent_coef        | 0.00099  |
|    ent_coef_loss   | -0.858   |
|    learning_rate   | 0.0003   |
|    n_updates       | 84467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.655   |
| time/              |          |
|    episodes        | 2816     |
|    fps             | 126      |
|    time_elapsed    | 5332     |
|    total_timesteps | 675840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.653   |
| time/              |          |
|    episodes        | 2820     |
|    fps             | 126      |
|    time_elapsed    | 5351     |
|    total_timesteps | 677760   |
| train/             |          |
|    actor_loss      | 0.382    |
|    critic_loss     | 0.00418  |
|    ent_coef        | 0.000973 |
|    ent_coef_loss   | 0.396    |
|    learning_rate   | 0.0003   |
|    n_updates       | 84707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.653   |
| time/              |          |
|    episodes        | 2824     |
|    fps             | 126      |
|    time_elapsed    | 5351     |
|    total_timesteps | 677760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.69    |
| time/              |          |
|    episodes        | 2828     |
|    fps             | 126      |
|    time_elapsed    | 5370     |
|    total_timesteps | 679680   |
| train/             |          |
|    actor_loss      | 0.26     |
|    critic_loss     | 0.000276 |
|    ent_coef        | 0.00094  |
|    ent_coef_loss   | -2.17    |
|    learning_rate   | 0.0003   |
|    n_updates       | 84947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.69    |
| time/              |          |
|    episodes        | 2832     |
|    fps             | 126      |
|    time_elapsed    | 5370     |
|    total_timesteps | 679680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.687   |
| time/              |          |
|    episodes        | 2836     |
|    fps             | 126      |
|    time_elapsed    | 5389     |
|    total_timesteps | 681600   |
| train/             |          |
|    actor_loss      | 0.406    |
|    critic_loss     | 0.0112   |
|    ent_coef        | 0.000911 |
|    ent_coef_loss   | 1.72     |
|    learning_rate   | 0.0003   |
|    n_updates       | 85187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.687   |
| time/              |          |
|    episodes        | 2840     |
|    fps             | 126      |
|    time_elapsed    | 5389     |
|    total_timesteps | 681600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.687   |
| time/              |          |
|    episodes        | 2844     |
|    fps             | 126      |
|    time_elapsed    | 5408     |
|    total_timesteps | 683520   |
| train/             |          |
|    actor_loss      | 0.264    |
|    critic_loss     | 0.00738  |
|    ent_coef        | 0.000924 |
|    ent_coef_loss   | -1.25    |
|    learning_rate   | 0.0003   |
|    n_updates       | 85427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.687   |
| time/              |          |
|    episodes        | 2848     |
|    fps             | 126      |
|    time_elapsed    | 5408     |
|    total_timesteps | 683520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.669   |
| time/              |          |
|    episodes        | 2852     |
|    fps             | 126      |
|    time_elapsed    | 5426     |
|    total_timesteps | 685440   |
| train/             |          |
|    actor_loss      | 0.252    |
|    critic_loss     | 0.000202 |
|    ent_coef        | 0.000945 |
|    ent_coef_loss   | 0.101    |
|    learning_rate   | 0.0003   |
|    n_updates       | 85667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.669   |
| time/              |          |
|    episodes        | 2856     |
|    fps             | 126      |
|    time_elapsed    | 5426     |
|    total_timesteps | 685440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.65    |
| time/              |          |
|    episodes        | 2860     |
|    fps             | 126      |
|    time_elapsed    | 5445     |
|    total_timesteps | 687360   |
| train/             |          |
|    actor_loss      | 0.254    |
|    critic_loss     | 0.000191 |
|    ent_coef        | 0.00095  |
|    ent_coef_loss   | -0.295   |
|    learning_rate   | 0.0003   |
|    n_updates       | 85907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.65    |
| time/              |          |
|    episodes        | 2864     |
|    fps             | 126      |
|    time_elapsed    | 5445     |
|    total_timesteps | 687360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.639   |
| time/              |          |
|    episodes        | 2868     |
|    fps             | 126      |
|    time_elapsed    | 5465     |
|    total_timesteps | 689280   |
| train/             |          |
|    actor_loss      | 0.248    |
|    critic_loss     | 0.000181 |
|    ent_coef        | 0.000974 |
|    ent_coef_loss   | -0.765   |
|    learning_rate   | 0.0003   |
|    n_updates       | 86147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.639   |
| time/              |          |
|    episodes        | 2872     |
|    fps             | 126      |
|    time_elapsed    | 5465     |
|    total_timesteps | 689280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.647   |
| time/              |          |
|    episodes        | 2876     |
|    fps             | 126      |
|    time_elapsed    | 5484     |
|    total_timesteps | 691200   |
| train/             |          |
|    actor_loss      | 0.275    |
|    critic_loss     | 0.00184  |
|    ent_coef        | 0.000966 |
|    ent_coef_loss   | -0.0676  |
|    learning_rate   | 0.0003   |
|    n_updates       | 86387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.647   |
| time/              |          |
|    episodes        | 2880     |
|    fps             | 126      |
|    time_elapsed    | 5484     |
|    total_timesteps | 691200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 2884     |
|    fps             | 125      |
|    time_elapsed    | 5503     |
|    total_timesteps | 693120   |
| train/             |          |
|    actor_loss      | 0.265    |
|    critic_loss     | 0.000844 |
|    ent_coef        | 0.000938 |
|    ent_coef_loss   | 0.164    |
|    learning_rate   | 0.0003   |
|    n_updates       | 86627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 2888     |
|    fps             | 125      |
|    time_elapsed    | 5503     |
|    total_timesteps | 693120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.638   |
| time/              |          |
|    episodes        | 2892     |
|    fps             | 125      |
|    time_elapsed    | 5522     |
|    total_timesteps | 695040   |
| train/             |          |
|    actor_loss      | 0.343    |
|    critic_loss     | 0.0137   |
|    ent_coef        | 0.000936 |
|    ent_coef_loss   | 1.08     |
|    learning_rate   | 0.0003   |
|    n_updates       | 86867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.638   |
| time/              |          |
|    episodes        | 2896     |
|    fps             | 125      |
|    time_elapsed    | 5522     |
|    total_timesteps | 695040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.63    |
| time/              |          |
|    episodes        | 2900     |
|    fps             | 125      |
|    time_elapsed    | 5549     |
|    total_timesteps | 696960   |
| train/             |          |
|    actor_loss      | 0.263    |
|    critic_loss     | 0.000298 |
|    ent_coef        | 0.000947 |
|    ent_coef_loss   | 0.242    |
|    learning_rate   | 0.0003   |
|    n_updates       | 87107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.63    |
| time/              |          |
|    episodes        | 2904     |
|    fps             | 125      |
|    time_elapsed    | 5549     |
|    total_timesteps | 696960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.616   |
| time/              |          |
|    episodes        | 2908     |
|    fps             | 125      |
|    time_elapsed    | 5572     |
|    total_timesteps | 698880   |
| train/             |          |
|    actor_loss      | 0.292    |
|    critic_loss     | 0.00486  |
|    ent_coef        | 0.00101  |
|    ent_coef_loss   | 0.812    |
|    learning_rate   | 0.0003   |
|    n_updates       | 87347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.616   |
| time/              |          |
|    episodes        | 2912     |
|    fps             | 125      |
|    time_elapsed    | 5572     |
|    total_timesteps | 698880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.603   |
| time/              |          |
|    episodes        | 2916     |
|    fps             | 125      |
|    time_elapsed    | 5594     |
|    total_timesteps | 700800   |
| train/             |          |
|    actor_loss      | 0.316    |
|    critic_loss     | 0.000438 |
|    ent_coef        | 0.000966 |
|    ent_coef_loss   | -1.37    |
|    learning_rate   | 0.0003   |
|    n_updates       | 87587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.603   |
| time/              |          |
|    episodes        | 2920     |
|    fps             | 125      |
|    time_elapsed    | 5594     |
|    total_timesteps | 700800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.587   |
| time/              |          |
|    episodes        | 2924     |
|    fps             | 125      |
|    time_elapsed    | 5618     |
|    total_timesteps | 702720   |
| train/             |          |
|    actor_loss      | 0.267    |
|    critic_loss     | 0.00015  |
|    ent_coef        | 0.000965 |
|    ent_coef_loss   | -0.589   |
|    learning_rate   | 0.0003   |
|    n_updates       | 87827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.587   |
| time/              |          |
|    episodes        | 2928     |
|    fps             | 125      |
|    time_elapsed    | 5618     |
|    total_timesteps | 702720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.571   |
| time/              |          |
|    episodes        | 2932     |
|    fps             | 124      |
|    time_elapsed    | 5638     |
|    total_timesteps | 704640   |
| train/             |          |
|    actor_loss      | 0.269    |
|    critic_loss     | 0.000234 |
|    ent_coef        | 0.000953 |
|    ent_coef_loss   | -0.357   |
|    learning_rate   | 0.0003   |
|    n_updates       | 88067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.571   |
| time/              |          |
|    episodes        | 2936     |
|    fps             | 124      |
|    time_elapsed    | 5638     |
|    total_timesteps | 704640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.564   |
| time/              |          |
|    episodes        | 2940     |
|    fps             | 124      |
|    time_elapsed    | 5662     |
|    total_timesteps | 706560   |
| train/             |          |
|    actor_loss      | 0.263    |
|    critic_loss     | 0.000139 |
|    ent_coef        | 0.000985 |
|    ent_coef_loss   | -0.236   |
|    learning_rate   | 0.0003   |
|    n_updates       | 88307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.564   |
| time/              |          |
|    episodes        | 2944     |
|    fps             | 124      |
|    time_elapsed    | 5662     |
|    total_timesteps | 706560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.563   |
| time/              |          |
|    episodes        | 2948     |
|    fps             | 124      |
|    time_elapsed    | 5684     |
|    total_timesteps | 708480   |
| train/             |          |
|    actor_loss      | 0.275    |
|    critic_loss     | 0.00016  |
|    ent_coef        | 0.000989 |
|    ent_coef_loss   | -0.611   |
|    learning_rate   | 0.0003   |
|    n_updates       | 88547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.563   |
| time/              |          |
|    episodes        | 2952     |
|    fps             | 124      |
|    time_elapsed    | 5684     |
|    total_timesteps | 708480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.582   |
| time/              |          |
|    episodes        | 2956     |
|    fps             | 124      |
|    time_elapsed    | 5704     |
|    total_timesteps | 710400   |
| train/             |          |
|    actor_loss      | 0.275    |
|    critic_loss     | 0.00017  |
|    ent_coef        | 0.000968 |
|    ent_coef_loss   | -0.47    |
|    learning_rate   | 0.0003   |
|    n_updates       | 88787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.582   |
| time/              |          |
|    episodes        | 2960     |
|    fps             | 124      |
|    time_elapsed    | 5704     |
|    total_timesteps | 710400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 2964     |
|    fps             | 124      |
|    time_elapsed    | 5725     |
|    total_timesteps | 712320   |
| train/             |          |
|    actor_loss      | 0.279    |
|    critic_loss     | 0.000187 |
|    ent_coef        | 0.000994 |
|    ent_coef_loss   | -1.38    |
|    learning_rate   | 0.0003   |
|    n_updates       | 89027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.584   |
| time/              |          |
|    episodes        | 2968     |
|    fps             | 124      |
|    time_elapsed    | 5725     |
|    total_timesteps | 712320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.568   |
| time/              |          |
|    episodes        | 2972     |
|    fps             | 124      |
|    time_elapsed    | 5746     |
|    total_timesteps | 714240   |
| train/             |          |
|    actor_loss      | 0.299    |
|    critic_loss     | 0.005    |
|    ent_coef        | 0.00102  |
|    ent_coef_loss   | 1.38     |
|    learning_rate   | 0.0003   |
|    n_updates       | 89267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.568   |
| time/              |          |
|    episodes        | 2976     |
|    fps             | 124      |
|    time_elapsed    | 5746     |
|    total_timesteps | 714240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.568   |
| time/              |          |
|    episodes        | 2980     |
|    fps             | 124      |
|    time_elapsed    | 5771     |
|    total_timesteps | 716160   |
| train/             |          |
|    actor_loss      | 0.283    |
|    critic_loss     | 0.000417 |
|    ent_coef        | 0.00106  |
|    ent_coef_loss   | 0.303    |
|    learning_rate   | 0.0003   |
|    n_updates       | 89507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.568   |
| time/              |          |
|    episodes        | 2984     |
|    fps             | 124      |
|    time_elapsed    | 5771     |
|    total_timesteps | 716160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.57    |
| time/              |          |
|    episodes        | 2988     |
|    fps             | 123      |
|    time_elapsed    | 5792     |
|    total_timesteps | 718080   |
| train/             |          |
|    actor_loss      | 0.371    |
|    critic_loss     | 0.0132   |
|    ent_coef        | 0.00106  |
|    ent_coef_loss   | 0.593    |
|    learning_rate   | 0.0003   |
|    n_updates       | 89747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.57    |
| time/              |          |
|    episodes        | 2992     |
|    fps             | 123      |
|    time_elapsed    | 5792     |
|    total_timesteps | 718080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.567   |
| time/              |          |
|    episodes        | 2996     |
|    fps             | 124      |
|    time_elapsed    | 5804     |
|    total_timesteps | 720000   |
| train/             |          |
|    actor_loss      | 0.303    |
|    critic_loss     | 0.000545 |
|    ent_coef        | 0.00105  |
|    ent_coef_loss   | 1.27     |
|    learning_rate   | 0.0003   |
|    n_updates       | 89987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.567   |
| time/              |          |
|    episodes        | 3000     |
|    fps             | 124      |
|    time_elapsed    | 5804     |
|    total_timesteps | 720000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.569   |
| time/              |          |
|    episodes        | 3004     |
|    fps             | 123      |
|    time_elapsed    | 5823     |
|    total_timesteps | 721920   |
| train/             |          |
|    actor_loss      | 0.35     |
|    critic_loss     | 0.0103   |
|    ent_coef        | 0.00106  |
|    ent_coef_loss   | 0.755    |
|    learning_rate   | 0.0003   |
|    n_updates       | 90227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.569   |
| time/              |          |
|    episodes        | 3008     |
|    fps             | 123      |
|    time_elapsed    | 5823     |
|    total_timesteps | 721920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.581   |
| time/              |          |
|    episodes        | 3012     |
|    fps             | 123      |
|    time_elapsed    | 5845     |
|    total_timesteps | 723840   |
| train/             |          |
|    actor_loss      | 0.297    |
|    critic_loss     | 0.000179 |
|    ent_coef        | 0.00105  |
|    ent_coef_loss   | 0.26     |
|    learning_rate   | 0.0003   |
|    n_updates       | 90467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.581   |
| time/              |          |
|    episodes        | 3016     |
|    fps             | 123      |
|    time_elapsed    | 5845     |
|    total_timesteps | 723840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.577   |
| time/              |          |
|    episodes        | 3020     |
|    fps             | 123      |
|    time_elapsed    | 5866     |
|    total_timesteps | 725760   |
| train/             |          |
|    actor_loss      | 0.29     |
|    critic_loss     | 0.000504 |
|    ent_coef        | 0.00104  |
|    ent_coef_loss   | -1.06    |
|    learning_rate   | 0.0003   |
|    n_updates       | 90707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.577   |
| time/              |          |
|    episodes        | 3024     |
|    fps             | 123      |
|    time_elapsed    | 5866     |
|    total_timesteps | 725760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.572   |
| time/              |          |
|    episodes        | 3028     |
|    fps             | 123      |
|    time_elapsed    | 5886     |
|    total_timesteps | 727680   |
| train/             |          |
|    actor_loss      | 0.323    |
|    critic_loss     | 0.000281 |
|    ent_coef        | 0.001    |
|    ent_coef_loss   | -0.694   |
|    learning_rate   | 0.0003   |
|    n_updates       | 90947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.572   |
| time/              |          |
|    episodes        | 3032     |
|    fps             | 123      |
|    time_elapsed    | 5886     |
|    total_timesteps | 727680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.574   |
| time/              |          |
|    episodes        | 3036     |
|    fps             | 123      |
|    time_elapsed    | 5906     |
|    total_timesteps | 729600   |
| train/             |          |
|    actor_loss      | 0.395    |
|    critic_loss     | 0.0135   |
|    ent_coef        | 0.00095  |
|    ent_coef_loss   | -0.466   |
|    learning_rate   | 0.0003   |
|    n_updates       | 91187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.574   |
| time/              |          |
|    episodes        | 3040     |
|    fps             | 123      |
|    time_elapsed    | 5906     |
|    total_timesteps | 729600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.57    |
| time/              |          |
|    episodes        | 3044     |
|    fps             | 123      |
|    time_elapsed    | 5929     |
|    total_timesteps | 731520   |
| train/             |          |
|    actor_loss      | 0.291    |
|    critic_loss     | 0.000193 |
|    ent_coef        | 0.000929 |
|    ent_coef_loss   | -1.38    |
|    learning_rate   | 0.0003   |
|    n_updates       | 91427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.57    |
| time/              |          |
|    episodes        | 3048     |
|    fps             | 123      |
|    time_elapsed    | 5929     |
|    total_timesteps | 731520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.573   |
| time/              |          |
|    episodes        | 3052     |
|    fps             | 123      |
|    time_elapsed    | 5951     |
|    total_timesteps | 733440   |
| train/             |          |
|    actor_loss      | 0.363    |
|    critic_loss     | 0.0103   |
|    ent_coef        | 0.000917 |
|    ent_coef_loss   | 2.82     |
|    learning_rate   | 0.0003   |
|    n_updates       | 91667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.573   |
| time/              |          |
|    episodes        | 3056     |
|    fps             | 123      |
|    time_elapsed    | 5951     |
|    total_timesteps | 733440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.558   |
| time/              |          |
|    episodes        | 3060     |
|    fps             | 123      |
|    time_elapsed    | 5977     |
|    total_timesteps | 735360   |
| train/             |          |
|    actor_loss      | 0.299    |
|    critic_loss     | 0.000224 |
|    ent_coef        | 0.000891 |
|    ent_coef_loss   | -1.06    |
|    learning_rate   | 0.0003   |
|    n_updates       | 91907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.558   |
| time/              |          |
|    episodes        | 3064     |
|    fps             | 123      |
|    time_elapsed    | 5977     |
|    total_timesteps | 735360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.57    |
| time/              |          |
|    episodes        | 3068     |
|    fps             | 122      |
|    time_elapsed    | 6001     |
|    total_timesteps | 737280   |
| train/             |          |
|    actor_loss      | 0.337    |
|    critic_loss     | 0.00324  |
|    ent_coef        | 0.000857 |
|    ent_coef_loss   | 0.765    |
|    learning_rate   | 0.0003   |
|    n_updates       | 92147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.57    |
| time/              |          |
|    episodes        | 3072     |
|    fps             | 122      |
|    time_elapsed    | 6001     |
|    total_timesteps | 737280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.588   |
| time/              |          |
|    episodes        | 3076     |
|    fps             | 122      |
|    time_elapsed    | 6032     |
|    total_timesteps | 739200   |
| train/             |          |
|    actor_loss      | 0.303    |
|    critic_loss     | 0.000161 |
|    ent_coef        | 0.000836 |
|    ent_coef_loss   | -0.286   |
|    learning_rate   | 0.0003   |
|    n_updates       | 92387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.588   |
| time/              |          |
|    episodes        | 3080     |
|    fps             | 122      |
|    time_elapsed    | 6032     |
|    total_timesteps | 739200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.582   |
| time/              |          |
|    episodes        | 3084     |
|    fps             | 122      |
|    time_elapsed    | 6052     |
|    total_timesteps | 741120   |
| train/             |          |
|    actor_loss      | 0.402    |
|    critic_loss     | 0.00534  |
|    ent_coef        | 0.000826 |
|    ent_coef_loss   | 0.148    |
|    learning_rate   | 0.0003   |
|    n_updates       | 92627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.582   |
| time/              |          |
|    episodes        | 3088     |
|    fps             | 122      |
|    time_elapsed    | 6052     |
|    total_timesteps | 741120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.571   |
| time/              |          |
|    episodes        | 3092     |
|    fps             | 122      |
|    time_elapsed    | 6074     |
|    total_timesteps | 743040   |
| train/             |          |
|    actor_loss      | 0.401    |
|    critic_loss     | 0.00123  |
|    ent_coef        | 0.000841 |
|    ent_coef_loss   | 0.925    |
|    learning_rate   | 0.0003   |
|    n_updates       | 92867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.571   |
| time/              |          |
|    episodes        | 3096     |
|    fps             | 122      |
|    time_elapsed    | 6074     |
|    total_timesteps | 743040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.587   |
| time/              |          |
|    episodes        | 3100     |
|    fps             | 122      |
|    time_elapsed    | 6096     |
|    total_timesteps | 744960   |
| train/             |          |
|    actor_loss      | 0.319    |
|    critic_loss     | 0.00167  |
|    ent_coef        | 0.00086  |
|    ent_coef_loss   | 1.3      |
|    learning_rate   | 0.0003   |
|    n_updates       | 93107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.587   |
| time/              |          |
|    episodes        | 3104     |
|    fps             | 122      |
|    time_elapsed    | 6096     |
|    total_timesteps | 744960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.59    |
| time/              |          |
|    episodes        | 3108     |
|    fps             | 122      |
|    time_elapsed    | 6118     |
|    total_timesteps | 746880   |
| train/             |          |
|    actor_loss      | 0.305    |
|    critic_loss     | 0.000237 |
|    ent_coef        | 0.000869 |
|    ent_coef_loss   | -0.526   |
|    learning_rate   | 0.0003   |
|    n_updates       | 93347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.59    |
| time/              |          |
|    episodes        | 3112     |
|    fps             | 122      |
|    time_elapsed    | 6118     |
|    total_timesteps | 746880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.588   |
| time/              |          |
|    episodes        | 3116     |
|    fps             | 121      |
|    time_elapsed    | 6138     |
|    total_timesteps | 748800   |
| train/             |          |
|    actor_loss      | 0.301    |
|    critic_loss     | 0.000278 |
|    ent_coef        | 0.000899 |
|    ent_coef_loss   | -0.155   |
|    learning_rate   | 0.0003   |
|    n_updates       | 93587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.588   |
| time/              |          |
|    episodes        | 3120     |
|    fps             | 121      |
|    time_elapsed    | 6138     |
|    total_timesteps | 748800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 3124     |
|    fps             | 121      |
|    time_elapsed    | 6156     |
|    total_timesteps | 750720   |
| train/             |          |
|    actor_loss      | 0.306    |
|    critic_loss     | 0.000177 |
|    ent_coef        | 0.000895 |
|    ent_coef_loss   | -0.483   |
|    learning_rate   | 0.0003   |
|    n_updates       | 93827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.612   |
| time/              |          |
|    episodes        | 3128     |
|    fps             | 121      |
|    time_elapsed    | 6156     |
|    total_timesteps | 750720   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.629   |
| time/              |          |
|    episodes        | 3132     |
|    fps             | 121      |
|    time_elapsed    | 6176     |
|    total_timesteps | 752640   |
| train/             |          |
|    actor_loss      | 0.428    |
|    critic_loss     | 0.00898  |
|    ent_coef        | 0.000892 |
|    ent_coef_loss   | -1.3     |
|    learning_rate   | 0.0003   |
|    n_updates       | 94067    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.629   |
| time/              |          |
|    episodes        | 3136     |
|    fps             | 121      |
|    time_elapsed    | 6176     |
|    total_timesteps | 752640   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 3140     |
|    fps             | 121      |
|    time_elapsed    | 6196     |
|    total_timesteps | 754560   |
| train/             |          |
|    actor_loss      | 0.304    |
|    critic_loss     | 0.000168 |
|    ent_coef        | 0.000902 |
|    ent_coef_loss   | 0.525    |
|    learning_rate   | 0.0003   |
|    n_updates       | 94307    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 3144     |
|    fps             | 121      |
|    time_elapsed    | 6196     |
|    total_timesteps | 754560   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.629   |
| time/              |          |
|    episodes        | 3148     |
|    fps             | 121      |
|    time_elapsed    | 6214     |
|    total_timesteps | 756480   |
| train/             |          |
|    actor_loss      | 0.313    |
|    critic_loss     | 0.00033  |
|    ent_coef        | 0.000985 |
|    ent_coef_loss   | 0.702    |
|    learning_rate   | 0.0003   |
|    n_updates       | 94547    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.629   |
| time/              |          |
|    episodes        | 3152     |
|    fps             | 121      |
|    time_elapsed    | 6214     |
|    total_timesteps | 756480   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 3156     |
|    fps             | 121      |
|    time_elapsed    | 6235     |
|    total_timesteps | 758400   |
| train/             |          |
|    actor_loss      | 0.356    |
|    critic_loss     | 0.00497  |
|    ent_coef        | 0.000988 |
|    ent_coef_loss   | -0.406   |
|    learning_rate   | 0.0003   |
|    n_updates       | 94787    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 3160     |
|    fps             | 121      |
|    time_elapsed    | 6235     |
|    total_timesteps | 758400   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.645   |
| time/              |          |
|    episodes        | 3164     |
|    fps             | 121      |
|    time_elapsed    | 6254     |
|    total_timesteps | 760320   |
| train/             |          |
|    actor_loss      | 0.321    |
|    critic_loss     | 0.000361 |
|    ent_coef        | 0.000992 |
|    ent_coef_loss   | -0.632   |
|    learning_rate   | 0.0003   |
|    n_updates       | 95027    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.645   |
| time/              |          |
|    episodes        | 3168     |
|    fps             | 121      |
|    time_elapsed    | 6254     |
|    total_timesteps | 760320   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.621   |
| time/              |          |
|    episodes        | 3172     |
|    fps             | 121      |
|    time_elapsed    | 6277     |
|    total_timesteps | 762240   |
| train/             |          |
|    actor_loss      | 0.345    |
|    critic_loss     | 0.00164  |
|    ent_coef        | 0.000965 |
|    ent_coef_loss   | 0.189    |
|    learning_rate   | 0.0003   |
|    n_updates       | 95267    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.621   |
| time/              |          |
|    episodes        | 3176     |
|    fps             | 121      |
|    time_elapsed    | 6277     |
|    total_timesteps | 762240   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.627   |
| time/              |          |
|    episodes        | 3180     |
|    fps             | 121      |
|    time_elapsed    | 6303     |
|    total_timesteps | 764160   |
| train/             |          |
|    actor_loss      | 0.442    |
|    critic_loss     | 0.0016   |
|    ent_coef        | 0.000944 |
|    ent_coef_loss   | 0.166    |
|    learning_rate   | 0.0003   |
|    n_updates       | 95507    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.627   |
| time/              |          |
|    episodes        | 3184     |
|    fps             | 121      |
|    time_elapsed    | 6303     |
|    total_timesteps | 764160   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.65    |
| time/              |          |
|    episodes        | 3188     |
|    fps             | 121      |
|    time_elapsed    | 6330     |
|    total_timesteps | 766080   |
| train/             |          |
|    actor_loss      | 0.404    |
|    critic_loss     | 0.00308  |
|    ent_coef        | 0.000927 |
|    ent_coef_loss   | 0.299    |
|    learning_rate   | 0.0003   |
|    n_updates       | 95747    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.65    |
| time/              |          |
|    episodes        | 3192     |
|    fps             | 121      |
|    time_elapsed    | 6330     |
|    total_timesteps | 766080   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.653   |
| time/              |          |
|    episodes        | 3196     |
|    fps             | 120      |
|    time_elapsed    | 6357     |
|    total_timesteps | 768000   |
| train/             |          |
|    actor_loss      | 0.309    |
|    critic_loss     | 0.000533 |
|    ent_coef        | 0.000972 |
|    ent_coef_loss   | 0.111    |
|    learning_rate   | 0.0003   |
|    n_updates       | 95987    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.653   |
| time/              |          |
|    episodes        | 3200     |
|    fps             | 120      |
|    time_elapsed    | 6357     |
|    total_timesteps | 768000   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.633   |
| time/              |          |
|    episodes        | 3204     |
|    fps             | 120      |
|    time_elapsed    | 6378     |
|    total_timesteps | 769920   |
| train/             |          |
|    actor_loss      | 0.306    |
|    critic_loss     | 0.0002   |
|    ent_coef        | 0.000974 |
|    ent_coef_loss   | -0.54    |
|    learning_rate   | 0.0003   |
|    n_updates       | 96227    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.633   |
| time/              |          |
|    episodes        | 3208     |
|    fps             | 120      |
|    time_elapsed    | 6378     |
|    total_timesteps | 769920   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.641   |
| time/              |          |
|    episodes        | 3212     |
|    fps             | 120      |
|    time_elapsed    | 6399     |
|    total_timesteps | 771840   |
| train/             |          |
|    actor_loss      | 0.387    |
|    critic_loss     | 0.00272  |
|    ent_coef        | 0.000985 |
|    ent_coef_loss   | 0.333    |
|    learning_rate   | 0.0003   |
|    n_updates       | 96467    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.641   |
| time/              |          |
|    episodes        | 3216     |
|    fps             | 120      |
|    time_elapsed    | 6399     |
|    total_timesteps | 771840   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.639   |
| time/              |          |
|    episodes        | 3220     |
|    fps             | 120      |
|    time_elapsed    | 6421     |
|    total_timesteps | 773760   |
| train/             |          |
|    actor_loss      | 0.308    |
|    critic_loss     | 0.000194 |
|    ent_coef        | 0.000975 |
|    ent_coef_loss   | -0.56    |
|    learning_rate   | 0.0003   |
|    n_updates       | 96707    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.639   |
| time/              |          |
|    episodes        | 3224     |
|    fps             | 120      |
|    time_elapsed    | 6421     |
|    total_timesteps | 773760   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.611   |
| time/              |          |
|    episodes        | 3228     |
|    fps             | 120      |
|    time_elapsed    | 6442     |
|    total_timesteps | 775680   |
| train/             |          |
|    actor_loss      | 0.342    |
|    critic_loss     | 0.00722  |
|    ent_coef        | 0.000981 |
|    ent_coef_loss   | 0.676    |
|    learning_rate   | 0.0003   |
|    n_updates       | 96947    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.611   |
| time/              |          |
|    episodes        | 3232     |
|    fps             | 120      |
|    time_elapsed    | 6442     |
|    total_timesteps | 775680   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.624   |
| time/              |          |
|    episodes        | 3236     |
|    fps             | 120      |
|    time_elapsed    | 6462     |
|    total_timesteps | 777600   |
| train/             |          |
|    actor_loss      | 0.359    |
|    critic_loss     | 0.00237  |
|    ent_coef        | 0.000959 |
|    ent_coef_loss   | -0.164   |
|    learning_rate   | 0.0003   |
|    n_updates       | 97187    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.624   |
| time/              |          |
|    episodes        | 3240     |
|    fps             | 120      |
|    time_elapsed    | 6462     |
|    total_timesteps | 777600   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.621   |
| time/              |          |
|    episodes        | 3244     |
|    fps             | 120      |
|    time_elapsed    | 6482     |
|    total_timesteps | 779520   |
| train/             |          |
|    actor_loss      | 0.323    |
|    critic_loss     | 0.000259 |
|    ent_coef        | 0.000964 |
|    ent_coef_loss   | 0.715    |
|    learning_rate   | 0.0003   |
|    n_updates       | 97427    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.621   |
| time/              |          |
|    episodes        | 3248     |
|    fps             | 120      |
|    time_elapsed    | 6482     |
|    total_timesteps | 779520   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 3252     |
|    fps             | 120      |
|    time_elapsed    | 6504     |
|    total_timesteps | 781440   |
| train/             |          |
|    actor_loss      | 0.586    |
|    critic_loss     | 0.000762 |
|    ent_coef        | 0.00103  |
|    ent_coef_loss   | -0.4     |
|    learning_rate   | 0.0003   |
|    n_updates       | 97667    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 3256     |
|    fps             | 120      |
|    time_elapsed    | 6504     |
|    total_timesteps | 781440   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 3260     |
|    fps             | 120      |
|    time_elapsed    | 6526     |
|    total_timesteps | 783360   |
| train/             |          |
|    actor_loss      | 0.417    |
|    critic_loss     | 0.0107   |
|    ent_coef        | 0.00103  |
|    ent_coef_loss   | -0.268   |
|    learning_rate   | 0.0003   |
|    n_updates       | 97907    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.618   |
| time/              |          |
|    episodes        | 3264     |
|    fps             | 120      |
|    time_elapsed    | 6526     |
|    total_timesteps | 783360   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.625   |
| time/              |          |
|    episodes        | 3268     |
|    fps             | 119      |
|    time_elapsed    | 6549     |
|    total_timesteps | 785280   |
| train/             |          |
|    actor_loss      | 0.32     |
|    critic_loss     | 0.000161 |
|    ent_coef        | 0.000998 |
|    ent_coef_loss   | -0.0602  |
|    learning_rate   | 0.0003   |
|    n_updates       | 98147    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.625   |
| time/              |          |
|    episodes        | 3272     |
|    fps             | 119      |
|    time_elapsed    | 6549     |
|    total_timesteps | 785280   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.63    |
| time/              |          |
|    episodes        | 3276     |
|    fps             | 119      |
|    time_elapsed    | 6574     |
|    total_timesteps | 787200   |
| train/             |          |
|    actor_loss      | 0.322    |
|    critic_loss     | 0.000147 |
|    ent_coef        | 0.000976 |
|    ent_coef_loss   | -0.708   |
|    learning_rate   | 0.0003   |
|    n_updates       | 98387    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.63    |
| time/              |          |
|    episodes        | 3280     |
|    fps             | 119      |
|    time_elapsed    | 6574     |
|    total_timesteps | 787200   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 3284     |
|    fps             | 119      |
|    time_elapsed    | 6595     |
|    total_timesteps | 789120   |
| train/             |          |
|    actor_loss      | 0.324    |
|    critic_loss     | 0.000196 |
|    ent_coef        | 0.000939 |
|    ent_coef_loss   | -0.311   |
|    learning_rate   | 0.0003   |
|    n_updates       | 98627    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.637   |
| time/              |          |
|    episodes        | 3288     |
|    fps             | 119      |
|    time_elapsed    | 6595     |
|    total_timesteps | 789120   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 3292     |
|    fps             | 119      |
|    time_elapsed    | 6616     |
|    total_timesteps | 791040   |
| train/             |          |
|    actor_loss      | 0.382    |
|    critic_loss     | 0.000573 |
|    ent_coef        | 0.000938 |
|    ent_coef_loss   | 0.346    |
|    learning_rate   | 0.0003   |
|    n_updates       | 98867    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.649   |
| time/              |          |
|    episodes        | 3296     |
|    fps             | 119      |
|    time_elapsed    | 6616     |
|    total_timesteps | 791040   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.658   |
| time/              |          |
|    episodes        | 3300     |
|    fps             | 119      |
|    time_elapsed    | 6637     |
|    total_timesteps | 792960   |
| train/             |          |
|    actor_loss      | 0.325    |
|    critic_loss     | 0.000132 |
|    ent_coef        | 0.000917 |
|    ent_coef_loss   | 0.26     |
|    learning_rate   | 0.0003   |
|    n_updates       | 99107    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.658   |
| time/              |          |
|    episodes        | 3304     |
|    fps             | 119      |
|    time_elapsed    | 6637     |
|    total_timesteps | 792960   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 3308     |
|    fps             | 119      |
|    time_elapsed    | 6657     |
|    total_timesteps | 794880   |
| train/             |          |
|    actor_loss      | 0.459    |
|    critic_loss     | 0.0206   |
|    ent_coef        | 0.000891 |
|    ent_coef_loss   | 0.176    |
|    learning_rate   | 0.0003   |
|    n_updates       | 99347    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 3312     |
|    fps             | 119      |
|    time_elapsed    | 6657     |
|    total_timesteps | 794880   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 3316     |
|    fps             | 119      |
|    time_elapsed    | 6668     |
|    total_timesteps | 796800   |
| train/             |          |
|    actor_loss      | 0.32     |
|    critic_loss     | 0.000736 |
|    ent_coef        | 0.000864 |
|    ent_coef_loss   | -0.845   |
|    learning_rate   | 0.0003   |
|    n_updates       | 99587    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.654   |
| time/              |          |
|    episodes        | 3320     |
|    fps             | 119      |
|    time_elapsed    | 6668     |
|    total_timesteps | 796800   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.662   |
| time/              |          |
|    episodes        | 3324     |
|    fps             | 119      |
|    time_elapsed    | 6683     |
|    total_timesteps | 798720   |
| train/             |          |
|    actor_loss      | 0.326    |
|    critic_loss     | 0.000148 |
|    ent_coef        | 0.000818 |
|    ent_coef_loss   | 0.0262   |
|    learning_rate   | 0.0003   |
|    n_updates       | 99827    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 240      |
|    ep_rew_mean     | -0.662   |
| time/              |          |
|    episodes        | 3328     |
|    fps             | 119      |
|    time_elapsed    | 6683     |
|    total_timesteps | 798720   |
---------------------------------
In [102]:
res_unhedged = rollout_policy_on_exogenous_X(
    cm, X_parallel, policy="unhedged", action_max=u_scale, X_ref=X_ref
)

res_lq = rollout_policy_on_exogenous_X(
    cm, X_parallel, policy="lq",
    K_lq=K_lq, action_max=u_scale, X_ref=X_ref
)

res_rl = rollout_policy_on_exogenous_X(
    cm, X_parallel, policy="rl",
    rl_model=model_l1, action_max=u_scale, X_ref=X_ref
)
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  u_t = -float(K_lq @ x_t)
In [103]:
eval_unhedged = eval_L1_cost(
    res_unhedged["NII"], res_unhedged["h"], res_unhedged["u"],
    lambda_h=lambda_h_star, kappa_u=kappa_u_rl
)

eval_lq = eval_L1_cost(
    res_lq["NII"], res_lq["h"], res_lq["u"],
    lambda_h=lambda_h_star, kappa_u=kappa_u_rl
)

eval_rl = eval_L1_cost(
    res_rl["NII"], res_rl["h"], res_rl["u"],
    lambda_h=lambda_h_star, kappa_u=kappa_u_rl
)

print("L1 cost (lower is better):")
print("Unhedged:", eval_unhedged)
print("LQ:", eval_lq)
print("RL:", eval_rl)
L1 cost (lower is better):
Unhedged: 0.0014518402932128523
LQ: 0.0014462204424909896
RL: 0.001430356297024223
In [104]:
plt.figure(figsize=(9,4))
plt.plot(res_unhedged["NII"], label="Unhedged")
plt.plot(res_lq["NII"], label="LQ (L2)")
plt.plot(res_rl["NII"], label="RL (L1)")
plt.title("Stress: +200bp parallel")
plt.xlabel("t (months)")
plt.ylabel("NII")
plt.legend()
plt.tight_layout()
plt.show()

plt.figure(figsize=(9,3))
plt.plot(res_lq["h"], label="LQ hedge")
plt.plot(res_rl["h"], label="RL hedge")
plt.title("Hedge inventory")
plt.legend()
plt.tight_layout()
plt.show()
No description has been provided for this image
No description has been provided for this image
In [105]:
# ============================================================
# 0) YOU PROVIDE THESE
# ============================================================
# You need a function that returns a rollout dict:
# {"NII": (T-1,), "h": (T,), "u": (T-1,)}
#

def get_rollout(cm, X_path, policy_name,
                K_lq=None, rl_model=None,
                X_ref=None, action_max=50.0):
    """
    Adapter around existing rollout_policy_on_exogenous_X.
    policy_name in {"unhedged","lq","rl"}.
    Returns dict with keys: NII, h, u.
    """
    if policy_name == "unhedged":
        res = rollout_policy_on_exogenous_X(cm, X_path, "unhedged",
                                            action_max=action_max, X_ref=X_ref)
    elif policy_name == "lq":
        res = rollout_policy_on_exogenous_X(cm, X_path, "lq",
                                            K_lq=K_lq, action_max=action_max, X_ref=X_ref)
    elif policy_name == "rl":
        res = rollout_policy_on_exogenous_X(cm, X_path, "rl",
                                            rl_model=rl_model, action_max=action_max, X_ref=X_ref)
    else:
        raise ValueError("policy_name must be 'unhedged', 'lq', or 'rl'")
    return {"NII": np.asarray(res["NII"]), "h": np.asarray(res["h"]), "u": np.asarray(res["u"])}


# ============================================================
# 1) METRICS + TABLES
# ============================================================

def l1_economic_cost(nii, h, u, lambda_h, kappa_u):
    """
    Evaluation functional for the L1 setting:
      mean( NII^2 + lambda_h*h^2 + kappa_u*|u| )
    (h aligned to t where NII,u exist => h[:-1])
    """
    nii = np.asarray(nii)
    h = np.asarray(h)
    u = np.asarray(u)
    return float(np.mean(nii**2 + lambda_h*(h[:-1]**2) + kappa_u*np.abs(u)))

def summarize_rollout(roll, lambda_h=None, kappa_u=None):
    nii, h, u = roll["NII"], roll["h"], roll["u"]
    out = {
        "std_NII": float(np.std(nii)),
        "p05_NII": float(np.quantile(nii, 0.05)),
        "min_NII": float(np.min(nii)),
        "mean_NII": float(np.mean(nii)),
        "mean_abs_u": float(np.mean(np.abs(u))),
        "max_abs_u": float(np.max(np.abs(u))),
        "mean_abs_h": float(np.mean(np.abs(h))),
        "max_abs_h": float(np.max(np.abs(h))),
    }
    if (lambda_h is not None) and (kappa_u is not None):
        out["L1_cost"] = l1_economic_cost(nii, h, u, lambda_h, kappa_u)
    return out

def build_metrics_tables(cm, scenarios, K_lq, rl_model,
                         X_ref, action_max,
                         lambda_h_eval, kappa_u_eval):
    """
    scenarios: dict name -> X_path
    Returns:
      df_metrics: multiindex (scenario, policy)
      df_norm: normalized to unhedged per scenario
    """
    rows = []
    for scen_name, X_path in scenarios.items():
        for pol, label in [("unhedged","Unhedged"), ("lq","LQ (L2)"), ("rl","RL (L1)")]:
            roll = get_rollout(cm, X_path, pol, K_lq=K_lq, rl_model=rl_model,
                               X_ref=X_ref, action_max=action_max)
            met = summarize_rollout(roll, lambda_h=lambda_h_eval, kappa_u=kappa_u_eval)
            rows.append({"scenario": scen_name, "policy": label, **met})

    df = pd.DataFrame(rows).set_index(["scenario","policy"]).sort_index()

    # Normalized table: divide each metric by unhedged metric (per scenario)
    metrics_cols = [c for c in df.columns if c not in []]
    df_norm = df.copy()
    for scen in df.index.get_level_values(0).unique():
        base = df.loc[(scen, "Unhedged"), metrics_cols]
        df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0

    return df, df_norm


# ============================================================
# 2) PLOTS
# ============================================================

def plot_nii_overlay(rolls, title):
    plt.figure(figsize=(9,4))
    for label, roll in rolls.items():
        plt.plot(roll["NII"], label=label)
    plt.title(title)
    plt.xlabel("t (months)")
    plt.ylabel("NII")
    plt.legend()
    plt.tight_layout()
    plt.show()

def plot_h_overlay(rolls, title):
    plt.figure(figsize=(9,3))
    for label, roll in rolls.items():
        plt.plot(roll["h"], label=label)
    plt.title(title)
    plt.xlabel("t (months)")
    plt.ylabel("Hedge inventory h_t")
    plt.legend()
    plt.tight_layout()
    plt.show()

def plot_absu_histogram(roll_lq, roll_rl, title):
    plt.figure(figsize=(8,4))
    plt.hist(np.abs(roll_lq["u"]), bins=40, alpha=0.6, label="LQ (L2)")
    plt.hist(np.abs(roll_rl["u"]), bins=40, alpha=0.6, label="RL (L1)")
    plt.title(title)
    plt.xlabel("|u_t| (absolute hedge change)")
    plt.ylabel("Frequency")
    plt.legend()
    plt.tight_layout()
    plt.show()

def plot_risk_turnover_scatter(df_metrics, title="Risk–turnover scatter"):
    """
    Scatter per scenario/policy:
      x = mean_abs_u, y = std_NII
    """
    plt.figure(figsize=(8,5))
    for (scen, pol), row in df_metrics.iterrows():
        x = row["mean_abs_u"]
        y = row["std_NII"]
        plt.scatter(x, y)
        plt.annotate(f"{scen}\n{pol}", (x, y), fontsize=8, xytext=(4,4), textcoords="offset points")

    plt.xlabel("Turnover proxy: mean(|u_t|)")
    plt.ylabel("Risk proxy: std(NII)")
    plt.title(title)
    plt.tight_layout()
    plt.show()


# ============================================================
# 3) RUN IT: define scenario dictionary + produce outputs
# ============================================================

# --- Provide scenario paths ---
scenarios = {
    "Baseline (smoothed)": X_smooth,
    "Stress: +200bp parallel": X_parallel,
    "Stress: high vol x3": X_highvol,
}

# --- Evaluation settings ---
ACTION_MAX = 50.0
LAMBDA_H_EVAL = lambda_h_star      # chosen hedge-inventory penalty
KAPPA_U_EVAL  = 3e-4               # same kappa_u from L1 RL training/eval

# --- Models from before ---
# K_lq: from Riccati solution for LQ benchmark
# model_l1: trained SAC model for RL (L1)

df_metrics, df_norm = build_metrics_tables(
    cm=cm,
    scenarios=scenarios,
    K_lq=K_lq,
    rl_model=model_l1,
    X_ref=X_ref,
    action_max=ACTION_MAX,
    lambda_h_eval=LAMBDA_H_EVAL,
    kappa_u_eval=KAPPA_U_EVAL
)

print("=== TABLE 1: Core metrics ===")
display(df_metrics)

print("=== TABLE 2: Normalized to Unhedged (Unhedged=100) ===")
display(df_norm)

# ============================================================
# 4) SELECTED PLOTS (6 total)
# ============================================================

# Helper to fetch rolls for a scenario
def get_rolls_for_scenario(X_path, scen_label):
    roll_u = get_rollout(cm, X_path, "unhedged", X_ref=X_ref, action_max=ACTION_MAX)
    roll_l = get_rollout(cm, X_path, "lq", K_lq=K_lq, X_ref=X_ref, action_max=ACTION_MAX)
    roll_r = get_rollout(cm, X_path, "rl", rl_model=model_l1, X_ref=X_ref, action_max=ACTION_MAX)
    return {
        "Unhedged": roll_u,
        "LQ (L2)": roll_l,
        "RL (L1)": roll_r
    }

# (1) Baseline NII overlay
rolls_base = get_rolls_for_scenario(X_smooth, "Baseline (smoothed)")
plot_nii_overlay(rolls_base, "Baseline: NII paths (Unhedged vs LQ vs RL-L1)")

# (2) Stress +200bp NII overlay
rolls_par = get_rolls_for_scenario(X_parallel, "Stress: +200bp parallel")
plot_nii_overlay(rolls_par, "Stress (+200bp parallel): NII paths")

# (3) Stress high-vol NII overlay
rolls_hv = get_rolls_for_scenario(X_highvol, "Stress: high vol x3")
plot_nii_overlay(rolls_hv, "Stress (high vol x3): NII paths")

# (4) Baseline hedge inventory overlay (LQ vs RL)
plot_h_overlay({"LQ (L2)": rolls_base["LQ (L2)"], "RL (L1)": rolls_base["RL (L1)"]},
               "Baseline: Hedge inventory (LQ vs RL-L1)")

# (5) Histogram of |u| (baseline) to show sparse trading
plot_absu_histogram(rolls_base["LQ (L2)"], rolls_base["RL (L1)"],
                    "Baseline: Trading sparsity (|u_t| histogram)")

# (6) Risk–turnover scatter using df_metrics (all scenarios/policies)
plot_risk_turnover_scatter(df_metrics, title="Risk–turnover map across scenarios (policy dots)")
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  u_t = -float(K_lq @ x_t)
=== TABLE 1: Core metrics ===
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: divide by zero encountered in divide
  df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: invalid value encountered in divide
  df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: divide by zero encountered in divide
  df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: invalid value encountered in divide
  df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: divide by zero encountered in divide
  df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\635850829.py:91: RuntimeWarning: invalid value encountered in divide
  df_norm.loc[scen, metrics_cols] = (df.loc[scen, metrics_cols].values / base.values) * 100.0
std_NII p05_NII min_NII mean_NII mean_abs_u max_abs_u mean_abs_h max_abs_h L1_cost
scenario policy
Baseline (smoothed) LQ (L2) 0.037672 -0.028228 -0.064571 0.028723 0.130948 0.445572 14.301091 19.293542 0.002502
RL (L1) 0.037807 -0.028121 -0.063079 0.029002 0.095940 0.362447 14.424503 20.110600 0.002524
Unhedged 0.041502 -0.028967 -0.069115 0.031514 0.000000 0.000000 0.000000 0.000000 0.002716
Stress: +200bp parallel LQ (L2) 0.010117 0.020354 0.020128 0.033284 0.050177 0.539936 14.585011 17.558518 0.001446
RL (L1) 0.010486 0.020406 0.020177 0.033765 0.040012 0.281692 12.640191 15.121961 0.001430
Unhedged 0.011522 0.021739 0.021490 0.036319 0.000000 0.000000 0.000000 0.000000 0.001452
Stress: high vol x3 LQ (L2) 0.028799 -0.010316 -0.054684 0.036726 0.090683 0.484180 13.971398 17.529290 0.002409
RL (L1) 0.029025 -0.010358 -0.055629 0.037088 0.064955 0.307148 12.191542 17.197047 0.002401
Unhedged 0.031582 -0.010008 -0.058414 0.040042 0.000000 0.000000 0.000000 0.000000 0.002601
=== TABLE 2: Normalized to Unhedged (Unhedged=100) ===
std_NII p05_NII min_NII mean_NII mean_abs_u max_abs_u mean_abs_h max_abs_h L1_cost
scenario policy
Baseline (smoothed) LQ (L2) 90.771525 97.450109 93.425399 91.145057 inf inf inf inf 92.151849
RL (L1) 91.096380 97.078798 91.266641 92.029153 inf inf inf inf 92.950930
Unhedged 100.000000 100.000000 100.000000 100.000000 NaN NaN NaN NaN 100.000000
Stress: +200bp parallel LQ (L2) 87.800881 93.628756 93.665736 91.642397 inf inf inf inf 99.612915
RL (L1) 91.001717 93.870465 93.893229 92.966375 inf inf inf inf 98.520223
Unhedged 100.000000 100.000000 100.000000 100.000000 NaN NaN NaN NaN 100.000000
Stress: high vol x3 LQ (L2) 91.188148 103.082277 93.614348 91.717293 inf inf inf inf 92.611408
RL (L1) 91.902659 103.493855 95.231757 92.622201 inf inf inf inf 92.335225
Unhedged 100.000000 100.000000 100.000000 100.000000 NaN NaN NaN NaN 100.000000
C:\Users\thoma\AppData\Local\Temp\ipykernel_14560\3068264565.py:22: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  u_t = -float(K_lq @ x_t)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image