API Reference

This page documents the core API of pypecdp.

Browser

class pypecdp.browser.Browser(config=None, *, chrome_path='chromium', user_data_dir=None, clean_data_dir=True, headless=True, extra_args=None, ignore_default_args=None, env=None, **kwargs)[source]

Bases: object

High-level browser automation via Chrome DevTools Protocol.

Manages the Chrome/Chromium browser process lifecycle and CDP message routing between tabs and the browser.

Parameters:
config

Configuration object for browser launch.

proc

The browser subprocess.

reader

Stream reader for CDP pipe communication.

writer

Stream writer for CDP pipe communication.

targets

Mapping of target IDs to Tab instances.

Class Attributes:
tab_class: Class to use for creating Tab instances. Override this

in subclasses to use custom Tab implementations.

tab_class

alias of Tab

__init__(config=None, *, chrome_path='chromium', user_data_dir=None, clean_data_dir=True, headless=True, extra_args=None, ignore_default_args=None, env=None, **kwargs)[source]

Initialize Browser instance.

Parameters:
  • config (Config | None) – Pre-configured Config instance. If None, a new Config will be created from the keyword arguments.

  • chrome_path (str) – Path to Chrome/Chromium executable.

  • user_data_dir (str | None) – Path to user data directory. If None, a temporary directory will be created.

  • headless (bool) – Whether to run in headless mode.

  • extra_args (list[str] | None) – Additional command-line arguments.

  • ignore_default_args (list[str] | None) – List of default args to ignore.

  • env (dict[str, str] | None) – Environment variables to set for browser process.

  • **kwargs (Any) – Additional keyword arguments. Currently supports ‘auto_attach’ to control automatic target attachment. ‘default_domains’ to auto-enable CDP domains on a target.

  • clean_data_dir (bool)

Return type:

None

async classmethod start(config=None, **kwargs)[source]

Start a new Browser instance.

Parameters:
  • config (Config | None) – Pre-configured Config instance. If None, a new Config will be created from kwargs.

  • **kwargs (Any) – Arguments to pass to Config if config is None.

Returns:

An initialized and launched Browser instance.

Return type:

Browser

async close()[source]

Close the browser and clean up resources.

Closes all tabs, terminates the browser process, and cancels background tasks. This method handles cleanup gracefully: - Attempts graceful shutdown via CDP browser.close() - Falls back to SIGTERM if needed - Falls back to SIGKILL if process doesn’t exit in 3 seconds

Note

This method suppresses most errors to ensure cleanup completes even if the browser has already exited or crashed.

Return type:

None

async send(cmd, *, session_id=None, **kwargs)[source]

Send a CDP command and await its response.

Parameters:
  • cmd (Any) – CDP command generator to send.

  • session_id (SessionID | None) – Optional session ID for tab-specific commands.

  • **kwargs (Any) –

    Optional keyword arguments:

    • ignore_errors (bool): If True, suppress CDP errors and return None instead of raising RuntimeError.

Returns:

The parsed response from the CDP command, or None if ignore_errors=True and an error occurred.

Raises:
  • RuntimeError – If the CDP command returns an error and ignore_errors is False (default).

  • ConnectionError – If the CDP pipe is closed.

Return type:

Any

clear_handlers()[source]

Clear all registered event handlers for this browser.

Return type:

None

async create_tab(url='about:blank')[source]

Create a new tab and navigate to a URL.

Parameters:

url (str) – The URL to navigate to in the new tab.

Returns:

The newly created tab.

Return type:

Tab

async navigate(url, new_tab=False, timeout=10.0)[source]

Navigate to a URL in a tab.

Parameters:
  • url (str) – The URL to navigate to.

  • new_tab (bool) – If True, create a new tab. If False, reuse an existing tab if available.

  • timeout (float) – Maximum seconds to wait for page load.

Returns:

The tab that was navigated to the URL.

Return type:

Tab

on(event_name, handler)[source]

Register an event handler for browser-level events.

Parameters:
  • event_name (type[Any]) – The CDP event type to listen for.

  • handler (Callable[[Any], Any]) – Callback function or coroutine to handle the event.

Return type:

None

async cookies()[source]

Get all cookies for the browser.

Retrieves cookies from Chrome via CDP and converts them to a standard Python CookieJar. The returned CookieJar contains http.cookiejar.Cookie objects that are compatible with urllib, requests, and other HTTP libraries.

Note

The original CDP cookies (list[cdp.network.Cookie]) are preserved in the returned CookieJar’s cdp_cookies attribute. This allows access to CDP-specific cookie properties (priority, source_scheme, source_port, same_site, partition_key, etc.) that aren’t available in the standard cookiejar.Cookie objects.

Returns:

A CookieJar (subclass of http.cookiejar.CookieJar)

containing all browser cookies. The jar.cdp_cookies attribute contains the original CDP cookie objects.

Return type:

CookieJar

property pid: int | None

Get the browser process ID.

Returns:

The process ID if the browser is running, else None.

Return type:

int

property first_tab: Tab | None

Get the first active tab.

Returns:

A Tab instance if one exists, else None.

Return type:

Tab

Tab

class pypecdp.tab.Tab(browser, target_id, target_info=None)[source]

Bases: object

Represents a browser tab/target with CDP session.

Manages a CDP session for a specific target, handles event dispatching, and provides methods for navigation and DOM queries.

Parameters:
  • browser (Browser)

  • target_id (cdp.target.TargetID)

  • target_info (cdp.target.TargetInfo | None)

browser

The parent Browser instance.

target_id

CDP target identifier.

target_info

Optional target metadata.

session_id

CDP session ID for this tab.

Class Attributes:
elem_class: Class to use for creating Elem instances. Override this

in subclasses to use custom Elem implementations.

elem_class

alias of Elem

__init__(browser, target_id, target_info=None)[source]

Initialize a Tab instance.

Parameters:
  • browser (Browser) – The Browser instance managing this tab.

  • target_id (cdp.target.TargetID) – CDP target identifier.

  • target_info (cdp.target.TargetInfo | None) – Optional target metadata.

Return type:

None

async send(cmd, **kwargs)[source]

Send a CDP command within this tab’s session.

Parameters:
  • cmd (Any) – CDP command generator to send.

  • kwargs (Any)

Returns:

The parsed response from the CDP command.

Raises:

RuntimeError – If the tab is not attached or command fails.

Return type:

Any

on(event_name, handler)[source]

Register an event handler for tab-level CDP events.

Parameters:
  • event_name (type[Any]) – The CDP event type to listen for.

  • handler (Callable[[Any], Any]) – Callback function or coroutine to handle events.

Return type:

None

async handle_event(event)[source]

Dispatch a CDP event to registered handlers.

Parameters:

event (Any) – The CDP event object to dispatch.

Return type:

None

clear_handlers()[source]

Clear all registered event handlers for this tab.

Return type:

None

async attach()[source]

Attach a CDP session to this tab.

This method is used for manual tab attachment when auto_attach is disabled in the Browser configuration. If auto_attach is enabled (default), tabs are attached automatically by the Browser.

Returns:

The session ID for this tab after attachment.

If already attached, returns the existing session ID.

Return type:

SessionID

Raises:

RuntimeError – If the CDP attach_to_target command fails.

async navigate(url, timeout=10.0)[source]

Navigate to a URL and wait for page load.

Parameters:
  • url (str) – The URL to navigate to.

  • timeout (float) – Maximum seconds to wait for load event. Set to 0 to skip waiting.

Return type:

None

async wait_for_event(event=<class 'pypecdp.cdp.page.LoadEventFired'>, timeout=10.0)[source]

Wait for a specific CDP event to occur.

Parameters:
  • event (type[Any]) – The CDP event type to wait for.

  • timeout (float) – Maximum seconds to wait. Timeout errors are suppressed.

Return type:

None

async eval(expression, await_promise=True)[source]

Evaluate JavaScript expression in the page context.

Parameters:
  • expression (str) – JavaScript code to evaluate.

  • await_promise (bool) – Whether to await if expression returns a Promise.

Returns:

The result of the evaluation.

Return type:

RemoteObject

async find_elems(query, depth=100, pierce=True)[source]

Find all elements matching the specified query.

Searches from the document root and includes iframes. To search within a specific element, use Elem.query_selector().

Parameters:
  • query (str) – Plain text, CSS selector, or XPath search query.

  • depth (int) – Max depth to retrieve the document node.

  • pierce (bool) – Whether to pierce shadow DOM boundaries.

Returns:

List of matching elements, empty if nothing found.

Return type:

list[Elem]

async wait_for_elems(query, timeout=10.0, **kwargs)[source]

Wait for elements matching the specified query to appear.

Parameters:
  • query (str) – Plain text, CSS selector, or XPath search query.

  • timeout (float) – Maximum seconds to wait.

  • **kwargs (Any) – Additional arguments for find_elems method (e.g., depth, pierce, poll).

Returns:

List of matching elements, empty if timeout.

Return type:

list[Elem]

async find_elem(query, depth=100, pierce=True)[source]

Find the first element matching the specified query.

Searches from the document root and includes iframes. To search within a specific element, use Elem.query_selector().

Parameters:
  • query (str) – Plain text, CSS selector, or XPath search query.

  • depth (int) – Max depth to retrieve the document node.

  • pierce (bool) – Whether to pierce shadow DOM boundaries.

Returns:

The first matching element, or None if not found.

Return type:

Elem | None

async wait_for_elem(query, timeout=10.0, **kwargs)[source]

Wait for an element matching the specified query to appear.

Parameters:
  • query (str) – Plain text, CSS selector, or XPath search query.

  • timeout (float) – Maximum seconds to wait.

  • **kwargs (Any) – Additional arguments for find_elem method (e.g., depth, pierce, poll).

Returns:

The matching element, or None if timeout.

Return type:

Elem | None

async close()[source]

Close this tab.

Sends a close target command. Errors are suppressed if the tab is already closed or connection is lost.

Return type:

None

property parent: Tab | None

Get the parent tab if this tab is a child frame.

This property is useful for navigating iframe hierarchies. Top-level tabs (pages) will have no parent, while iframes and nested frames will return their parent tab.

Returns:

The parent Tab instance if this is a frame/iframe,

or None if this is a top-level page or parent not found.

Return type:

Tab | None

Example

>>> if tab.parent:
...     print(f"Frame in: {tab.parent.url}")
... else:
...     print("Top-level tab")
elem(node_id)[source]

Create an Elem instance from a CDP NodeId.

Searches the document tree for the node with the specified ID and wraps it in an Elem instance for interaction.

Parameters:

node_id (NodeId) – The NodeId of the DOM element to find.

Returns:

The created Elem instance wrapping the found node.

Return type:

Elem

Raises:

ValueError – If the tab document is not loaded or if the node with the specified ID is not found.

Elem

class pypecdp.elem.Elem(tab, node)[source]

Bases: object

Wrapper for DOM elements with interaction methods.

Provides high-level methods for interacting with elements in the browser, including clicking, typing, and retrieving attributes.

Parameters:
  • tab (Tab)

  • node (cdp.dom.Node)

tab

The Tab instance containing this element.

Type:

Tab

node

The CDP Node object representing the DOM element.

Type:

cdp.dom.Node

Note

Additional node properties like node_id and backend_node_id are accessible via __getattr__ delegation to the node object.

tab: Tab
node: cdp.dom.Node
async scroll_into_view()[source]

Scroll element into viewport and attempt to focus it.

Errors are suppressed if the element is detached or hidden.

Raises:

ReferenceError – If the tab session is no longer active.

Return type:

None

async focus()[source]

Set focus to the element.

Suppresses errors if the element is not focusable.

Raises:

ReferenceError – If the tab session is no longer active.

Return type:

None

async position()[source]

Get the position and coordinates of the element.

Returns:

Container with element coordinates, or None if unavailable.

Return type:

Position | None

Raises:

ReferenceError – If the tab session is no longer active.

async click(button=MouseButton.LEFT, click_count=1, delay=0.02)[source]

Click the element at its center point.

Scrolls the element into view, calculates the center, and dispatches mouse press and release events. Returns the top-level tab, which is useful when the click triggers navigation.

Parameters:
  • button (cdp.input_.MouseButton) – Mouse button to use (default: LEFT).

  • click_count (int) – Number of clicks (1 for single, 2 for double).

  • delay (float) – Delay in seconds between press and release.

Returns:

The current top-level Tab containing this element,

or None if the element position cannot be determined.

Return type:

Tab | None

Raises:

ReferenceError – If the tab session is no longer active.

Example

>>> link = await tab.wait_for_elem('a[href="/next"]')
>>> current_tab = await link.click()
>>> if current_tab:
...     await current_tab.wait_for_event(cdp.page.LoadEventFired)
...     print(f"Navigated to: {current_tab.url}")
async type(text)[source]

Type text into the element.

Focuses the element and inserts the text via CDP input command.

Parameters:

text (str) – The text string to type.

Raises:

ReferenceError – If the tab session is no longer active.

Return type:

None

async set_value(value)[source]

Set the value property of the element directly.

Attempts to resolve the element to a RemoteObject and set its value property via JavaScript. This method also dispatches an ‘input’ event to trigger any listeners. Falls back to typing character-by-character if resolution fails.

This is faster than type() for setting form field values but may not trigger all the same events as real user typing.

Parameters:

value (str) – The value to set.

Raises:

ReferenceError – If the tab session is no longer active.

Return type:

None

async text()[source]

Get the text content of the element.

Returns:

The text content, or None if unavailable.

Return type:

str | None

Raises:

ReferenceError – If the tab session is no longer active.

async html(include_shadow_dom=True)[source]

Get the outer HTML of the element.

Parameters:

include_shadow_dom (bool) – Whether to include shadow DOM content.

Returns:

The outer HTML string.

Return type:

str

Raises:

ReferenceError – If the tab session is no longer active.

async attribute(name)[source]

Get the value of an attribute.

Parameters:

name (str) – The attribute name to retrieve.

Returns:

The attribute value, or None if not found.

Return type:

str | None

Raises:

ReferenceError – If the tab session is no longer active.

async query_selector(selector)[source]

Find a child element matching the selector.

Parameters:

selector (str) – The CSS selector string.

Returns:

The found Elem or None if not found.

Return type:

Elem | None

Raises:

ReferenceError – If the tab session is no longer active.

async wait_for_selector(selector, timeout=10.0, poll=0.05)[source]

Wait for a child element matching the selector to appear.

Parameters:
  • selector (str) – CSS selector string.

  • timeout (float) – Maximum seconds to wait.

  • poll (float) – Polling interval in seconds.

Returns:

The matching element, or None if timeout.

Return type:

Elem | None

Raises:

ReferenceError – If the tab session is no longer active.

property parent: Elem | None

Get the parent element of this Elem.

Useful for traversing up the DOM tree. Can be chained to access ancestors: elem.parent.parent

Example:

# Navigate up to find a containing form
button = await tab.find_elem("button[type=submit]")
form = button.parent  # Get parent element
while form and form.node_name != "FORM":
    form = form.parent
Returns:

The parent Elem, or None if this is a root element

(no parent_id) or if the parent is the document root.

Return type:

Elem | None

Config

class pypecdp.config.Config(chrome_path='chromium', user_data_dir=None, clean_data_dir=True, headless=True, extra_args=<factory>, ignore_default_args=None, env=<factory>)[source]

Bases: object

Configuration for launching Chrome/Chromium with CDP pipe.

Parameters:
chrome_path

Path to Chrome/Chromium executable.

Type:

str

user_data_dir

Path to user data directory. If None, a temporary directory will be created.

Type:

str | None

clean_data_dir

Whether to remove existing user data directory before starting. Defaults to True. Set to False to preserve cookies, cache, and other browser state between runs.

Type:

bool

headless

Whether to run in headless mode.

Type:

bool

extra_args

Additional command-line arguments to pass.

Type:

list[str]

ignore_default_args

List of default args to ignore.

Type:

list[str] | None

env

Environment variables to set for the browser process.

Type:

dict[str, str]

Example

>>> config = Config(
...     chrome_path="chromium",
...     clean_data_dir=False,  # Preserve profile
...     headless=True
... )
chrome_path: str = 'chromium'
user_data_dir: str | None = None
clean_data_dir: bool = True
headless: bool = True
extra_args: list[str]
ignore_default_args: list[str] | None = None
env: dict[str, str]
ensure_user_data_dir()[source]

Ensure user data directory exists and return its path.

If user_data_dir is not set, creates a temporary directory.

Returns:

Path to the user data directory.

Return type:

str

build_argv()[source]

Build command-line arguments for Chrome launch.

Constructs the full argument list including headless mode, pipe debugging, user data directory, and extra args. Filters out arguments specified in ignore_default_args.

Returns:

Complete list of command-line arguments.

Return type:

list[str]

build_env()[source]

Build environment variables for Chrome process.

Merges current environment with custom overrides.

Returns:

Complete environment variable mapping.

Return type:

dict[str, str]

CookieJar

class pypecdp.util.CookieJar(cdp_cookies=None)[source]

Bases: CookieJar

Custom CookieJar for pypecdp.

Inherits from http.cookiejar.CookieJar to manage cookies within the pypecdp browser context. Properly converts CDP cookies to standard Python cookiejar.Cookie objects.

The original CDP cookies are preserved in the cdp_cookies attribute, allowing access to CDP-specific properties (priority, source_scheme, source_port, same_site, partition_key, etc.) that aren’t available in standard cookiejar.Cookie objects.

cdp_cookies

List of original cdp.network.Cookie objects used to populate this CookieJar. None if the jar was created empty.

Parameters:

cdp_cookies (list[cdp.network.Cookie] | None) – Optional list of CDP cookies to populate the jar.

__init__(cdp_cookies=None)[source]

Initialize the CookieJar with optional CDP cookies.

Parameters:

cdp_cookies (list[Cookie] | None) – List of CDP network.Cookie objects to convert.

Return type:

None