File hal-linux-5.4.239-x86-13.patch of Package kernel-source-0504rtai
diff --git a/Documentation/ipipe.rst b/Documentation/ipipe.rst
new file mode 100644
index 000000000000..9fba08b66253
--- /dev/null
+++ b/Documentation/ipipe.rst
@@ -0,0 +1,924 @@
+.. include:: <isonum.txt>
+
+===================================
+The Interrupt Pipeline (aka I-pipe)
+===================================
+
+:Copyright: |copy| 2018: Philippe Gerum
+
+Purpose
+=======
+
+Using Linux as a host for lightweight software cores specialized in
+delivering very short and bounded response times has been a popular
+way of supporting real-time applications in the embedded space over
+the years.
+
+This design - known as the *dual kernel* approach - introduces a small
+real-time infrastructure which schedules time-critical activities
+independently from the main kernel. Application threads co-managed by
+this infrastructure still benefit from the ancillary kernel services
+such as virtual memory management, and can also leverage the rich GPOS
+feature set Linux provides such as networking, data storage or GUIs.
+
+Although the real-time infrastructure has to present specific driver
+stack and API implementations to applications, there are nonetheless
+significant upsides to keeping the real-time core separate from the
+GPOS infrastructure:
+
+- because the two kernels are independent, real-time activities are
+ not serialized with GPOS operations internally, removing potential
+ delays which might be induced by the non time-critical
+ work. Likewise, there is no requirement for keeping the GPOS
+ operations fine-grained and highly preemptible at any time, which
+ would otherwise induce noticeable overhead on low-end hardware, due
+ to the requirement for pervasive task priority inheritance and IRQ
+ threading.
+
+- the functional isolation of the real-time infrastructure from the
+ rest of the kernel code restricts common bug hunting to the scope of
+ the smaller kernel, excluding most interactions with the very large
+ GPOS kernel base.
+
+- with a dedicated infrastructure providing a specific, well-defined
+ set of real-time services, applications can unambiguously figure out
+ which API calls are available for supporting time-critical work,
+ excluding all the rest as being potentially non-deterministic with
+ respect to response time.
+
+To support such a *dual kernel system*, we need the kernel to exhibit
+a high-priority execution context, for running out-of-band real-time
+duties concurrently to the regular operations.
+
+.. NOTE:: The I-pipe only introduces the basic mechanisms for hosting
+such a real-time core, enabling the common programming model for its
+applications in user-space. It does *not* implement the real-time core
+per se, which should be provided by a separate kernel component.
+
+The issue of interrupt response time
+====================================
+
+The real-time core has to act upon device interrupts with no delay,
+regardless of the regular kernel operations which may be ongoing when
+the interrupt is received by the CPU.
+
+However, to protect from deadlocks and maintain data integrity, Linux
+normally hard disables interrupts around any critical section of code
+which must not be preempted by interrupt handlers on the same CPU,
+enforcing a strictly serialized execution among those contexts.
+
+The unpredictable delay this may cause before external events can be
+handled is a major roadblock for kernel components requiring
+predictable and very short response times to external events, in the
+range of a few microseconds.
+
+Therefore, there is a basic requirement for prioritizing interrupt
+masking and delivery between the real-time core and GPOS operations,
+while maintaining consistent internal serialization for the kernel.
+
+To address this issue, the I-pipe implements a mechanism called
+*interrupt pipelining* turns all device IRQs into NMIs, only to run
+NMI-safe interrupt handlers from the perspective of the regular kernel
+activities.
+
+Two-stage IRQ pipeline
+======================
+
+.. _pipeline
+Interrupt pipelining is a lightweight approach based on the
+introduction of a separate, high-priority execution stage for running
+out-of-band interrupt handlers immediately upon IRQ receipt, which
+cannot be delayed by the in-band, regular kernel work even if the
+latter serializes the execution by - seemingly - disabling interrupts.
+
+IRQs which have no handlers in the high priority stage may be deferred
+on the receiving CPU until the out-of-band activity has quiesced on
+that CPU. Eventually, the preempted in-band code can resume normally,
+which may involve handling the deferred interrupts.
+
+In other words, interrupts are flowing down from the out-of-band to
+the in-band interrupt stages, which form a two-stage pipeline for
+prioritizing interrupt delivery.
+
+The runtime context of the out-of-band interrupt handlers is known as
+the *head stage* of the pipeline, as opposed to the in-band kernel
+activities sitting on the *root stage*::
+
+ Out-of-band In-band
+ IRQ handlers() IRQ handlers()
+ __________ _______________________ ______
+ . / / . . / / .
+ . / / . . / / .
+ . / / . . / / .
+ ___/ /______________________/ / .
+ [IRQ] -----> _______________________________/ .
+ . . . .
+ . Head . . Root .
+ . Stage . . Stage .
+ _____________________________________________
+
+
+A software core may base its own activities on the head stage,
+interposing on specific IRQ events, for delivering real-time
+capabilities to a particular set of applications. Meanwhile, the
+regular kernel operations keep going over the root stage unaffected,
+only delayed by short preemption times for running the out-of-band
+work.
+
+.. NOTE:: Interrupt pipelining is a partial implementation of [#f2]_,
+ in which an interrupt *stage* is a limited form of an
+ operating system *domain*.
+
+Virtual interrupt flag
+----------------------
+
+.. _flag:
+As hinted earlier, predictable response time of out-of-band handlers
+to IRQ receipts requires the in-band kernel work not to be allowed to
+delay them by masking interrupts in the CPU.
+
+However, critical sections delimited this way by the in-band code must
+still be enforced for the *root stage*, so that system integrity is
+not at risk. This means that although out-of-band IRQ handlers may run
+at any time while the *head stage* is accepting interrupts, in-band
+IRQ handlers should be allowed to run only when the root stage is
+accepting interrupts too.
+
+So we need to decouple the interrupt masking and delivery logic which
+applies to the head stage from the one in effect on the root stage, by
+implementing a dual interrupt control mechanism.
+
+To this end, a software logic managing a virtual interrupt flag (aka
+*IPIPE_STALL_FLAG*) is introduced by the interrupt pipeline between
+the hardware and the generic IRQ management layer. This logic can mask
+IRQs from the perspective of the regular kernel work when
+:c:func:`local_irq_save`, :c:func:`local_irq_disable` or any
+lock-controlled masking operations like :c:func:`spin_lock_irqsave` is
+called, while still accepting IRQs from the CPU for immediate delivery
+to out-of-band handlers.
+
+The head stage protects from interrupts by disabling them in the CPU's
+status register, while the root stage disables interrupts only
+virtually. A stage for which interrupts are disabled is said to be
+*stalled*. Conversely, *unstalling* a stage means re-enabling
+interrupts for it.
+
+Obviously, stalling the head stage implicitly means disabling
+further IRQ receipts for the root stage too.
+
+Interrupt deferral for the *root stage*
+---------------------------------------
+
+.. _deferral:
+.. _deferred:
+When the root stage is stalled by setting the virtual interrupt flag,
+the occurrence of any incoming IRQ which was not delivered to the
+*head stage* is recorded into a per-CPU log, postponing its actual
+delivery to the root stage.
+
+The delivery of the interrupt event to the corresponding in-band IRQ
+handler is deferred until the in-band kernel code clears the virtual
+interrupt flag by calling :c:func:`local_irq_enable` or any of its
+variants, which unstalls the root stage. When this happens, the
+interrupt state is resynchronized by playing the log, firing the
+in-band handlers for which an IRQ was set pending.
+
+::
+ /* Both stages unstalled on entry */
+ local_irq_save(flags);
+ <IRQx received: no out-of-band handler>
+ (pipeline logs IRQx event)
+ ...
+ local_irq_restore(flags);
+ (pipeline plays IRQx event)
+ handle_IRQx_interrupt();
+
+If the root stage is unstalled at the time of the IRQ receipt, the
+in-band handler is immediately invoked, just like with the
+non-pipelined IRQ model.
+
+.. NOTE:: The principle of deferring interrupt delivery based on a
+ software flag coupled to an event log has been originally
+ described as "Optimistic interrupt protection" in [#f1]_.
+
+Device interrupts virtually turned into NMIs
+--------------------------------------------
+
+From the standpoint of the in-band kernel code (i.e. the one running
+over the *root* interrupt stage) , the interrupt pipelining logic
+virtually turns all device IRQs into NMIs, for running out-of-band
+handlers.
+
+.. _re-entry:
+For this reason, out-of-band code may generally **NOT** re-enter
+in-band code, for preventing creepy situations like this one::
+
+ /* in-band context */
+ spin_lock_irqsave(&lock, flags);
+ <IRQx received: out-of-band handler installed>
+ handle_oob_event();
+ /* attempted re-entry to in-band from out-of-band. */
+ in_band_routine();
+ spin_lock_irqsave(&lock, flags);
+ <DEADLOCK>
+ ...
+ ...
+ ...
+ ...
+ spin_unlock irqrestore(&lock, flags);
+
+Even in absence of any attempt to get a spinlock recursively, the
+outer in-band code in the example above is entitled to assume that no
+access race can occur on the current CPU while interrupts are
+masked. Re-entering in-band code from an out-of-band handler would
+invalidate this assumption.
+
+In rare cases, we may need to fix up the in-band kernel routines in
+order to allow out-of-band handlers to call them. Typically, atomic_
+helpers are such routines, which serialize in-band and out-of-band
+callers.
+
+Virtual/Synthetic interrupt vectors
+-----------------------------------
+
+.. _synthetic:
+.. _virtual:
+The pipeline introduces an additional type of interrupts, which are
+purely software-originated, with no hardware involvement. These IRQs
+can be triggered by any kernel code. So-called virtual IRQs are
+inherently per-CPU events.
+
+Because the common pipeline flow_ applies to virtual interrupts, it
+is possible to attach them to out-of-band and/or in-band handlers,
+just like device interrupts.
+
+.. NOTE:: virtual interrupts and regular softirqs differ in essence:
+ the latter only exist in the in-band context, and therefore
+ cannot trigger out-of-band activities.
+
+Virtual interrupt vectors are allocated by a call to
+:c:func:`ipipe_alloc_virq`, and conversely released with
+:c:func:`ipipe_free_virq`.
+
+For instance, a virtual interrupt can be used for triggering an
+in-band activity on the root stage from the head stage as follows::
+
+ #include <linux/ipipe.h>
+
+ static void virq_handler(unsigned int virq, void *cookie)
+ {
+ do_in_band_work();
+ }
+
+ void install_virq(void)
+ {
+ unsigned int virq;
+ ...
+ virq = ipipe_alloc_virq();
+ ...
+ ipipe_request_irq(ipipe_root_domain, virq, virq_handler,
+ handler_arg, NULL);
+ }
+
+An out-of-band handler can schedule the execution of
+:c:func:`virq_handler` like this::
+
+ ipipe_post_irq_root(virq);
+
+Conversely, a virtual interrupt can be handled from the out-of-band
+context::
+
+ static void virq_oob_handler(unsigned int virq, void *cookie)
+ {
+ do_oob_work();
+ }
+
+ void install_virq(void)
+ {
+ unsigned int virq;
+ ...
+ virq = ipipe_alloc_virq();
+ ...
+ ipipe_request_irq(ipipe_head_domain, virq, virq_oob_handler,
+ handler_arg, NULL);
+ }
+
+Any in-band code can trigger the immediate execution of
+:c:func:`virq_oob_handler` on the head stage as follows::
+
+ ipipe_post_irq_head(virq);
+
+Pipelined interrupt flow
+------------------------
+
+.. _flow:
+When interrupt pipelining is enabled, IRQs are first delivered to the
+pipeline entry point via a call to the generic
+:c:func:`__ipipe_dispatch_irq` routine. Before this happens, the event
+has been propagated through the arch-specific code for handling an IRQ::
+
+ asm_irq_entry
+ -> irqchip_handle_irq()
+ -> ipipe_handle_domain_irq()
+ -> __ipipe_grab_irq()
+ -> __ipipe_dispatch_irq()
+ -> irq_flow_handler()
+ <IRQ delivery logic>
+
+Contrary to the non-pipelined model, the generic IRQ flow handler does
+*not* call the in-band interrupt handler immediately, but only runs
+the irqchip-specific handler for acknowledging the incoming IRQ event
+in the hardware.
+
+.. _Holding interrupt lines:
+If the interrupt is either of the *level-triggered*, *fasteoi* or
+*percpu* type, the irqchip is given a chance to hold the interrupt
+line, typically by masking it, until either of the out-of-band or
+in-band handler have run. This addresses the following scenario, which
+happens for a similar reason while an IRQ thread waits for being
+scheduled in, requiring the same kind of provision::
+
+ /* root stage stalled on entry */
+ asm_irq_entry
+ ...
+ -> __ipipe_dispatch_irq()
+ ...
+ <IRQ logged, delivery deferred>
+ asm_irq_exit
+ /*
+ * CPU allowed to accept interrupts again with IRQ cause not
+ * acknowledged in device yet => **IRQ storm**.
+ */
+ asm_irq_entry
+ ...
+ asm_irq_exit
+ asm_irq_entry
+ ...
+ asm_irq_exit
+
+IRQ delivery logic
+------------------
+
+If an out-of-band handler exists for the interrupt received,
+:c:func:`__ipipe_dispatch_irq` invokes it immediately, after switching
+the execution context to the head stage if not current yet.
+
+Otherwise, if the execution context is currently over the root stage
+and unstalled, the pipeline core delivers it immediately to the
+in-band handler.
+
+In all other cases, the interrupt is only set pending into the per-CPU
+log, then the interrupt frame is left.
+
+Alternate scheduling
+====================
+
+The I-pipe promotes the idea that a *dual kernel* system should keep
+the functional overlap between the kernel and the real-time core
+minimal. To this end, a real-time thread should be merely seen as a
+regular task with additional scheduling capabilities guaranteeing very
+low response times.
+
+To support such idea, the I-pipe enables kthreads and regular user
+tasks to run alternatively in the out-of-band execution context
+introduced by the interrupt pipeline_ (aka *head* stage), or the
+common in-band kernel context for GPOS operations (aka *root* stage).
+
+As a result, real-time core applications in user-space benefit from
+the common Linux programming model - including virtual memory
+protection -, and still have access to the regular Linux services for
+carrying out non time-critical work.
+
+Task migration to the head stage
+--------------------------------
+
+Low latency response time to events can be achieved when Linux tasks
+wait for them from the out-of-band execution context. The real-time
+core is responsible for switching a task to such a context as part of
+its task management rules; the I-pipe facilitates this migration with
+dedicated services.
+
+The migration process of a task from the GPOS/in-band context to the
+high-priority, out-of-band context is as follows:
+
+1. :c:func:`__ipipe_migrate_head` is invoked from the migrating task
+ context, with the same prerequisites than for calling
+ :c:func:`schedule` (preemption enabled, interrupts on).
+
+.. _`in-band sleep operation`:
+2. the caller is put to interruptible sleep state (S).
+
+3. before resuming in-band operations, the next task picked by the
+ (regular kernel) scheduler on the same CPU for replacing the
+ migrating task fires :c:func:`ipipe_migration_hook` which the
+ real-time core should override (*__weak* binding). Before the call,
+ the head stage is stalled, interrupts are disabled in the CPU. The
+ root execution stage is still current though.
+
+4. the real-time core's implementation of
+ :c:func:`ipipe_migration_hook` is passed a pointer to the
+ task_struct descriptor of the migrating task. This routine is expected
+ to perform the necessary steps for taking control over the task on
+ behalf of the real-time core, re-scheduling its code appropriately
+ over the head stage. This typically involves resuming it from the
+ `out-of-band suspended state`_ applied during the converse migration
+ path.
+
+5. at some point later, when the migrated task is picked by the
+ real-time scheduler, it resumes execution on the head stage with
+ the register file previously saved by the kernel scheduler in
+ :c:func:`switch_to` at step 1.
+
+Task migration to the root stage
+--------------------------------
+
+Sometimes, a real-time thread may want to leave the out-of-band
+context, continuing execution from the in-band context instead, so as
+to:
+
+- run non time-critical (in-band) work involving regular system calls
+ handled by the kernel,
+
+- recover from CPU exceptions, such as handling major memory access
+ faults, for which there is no point in caring for response time, and
+ therefore makes no sense to duplicate in the real-time core anyway.
+
+.. NOTE: The discussion about exception_ handling covers the last
+ point in details.
+
+The migration process of a task from the high-priority, out-of-band
+context to the GPOS/in-band context is as follows::
+
+1. the real-time core schedules an in-band handler for execution which
+ should call :c:func:`wake_up_process` to unblock the migrating task
+ from the standpoint of the kernel scheduler. This is the
+ counterpart of the :ref:`in-band sleep operation <in-band sleep
+ operation>` from the converse migration path. A virtual_ IRQ can be
+ used for scheduling such event from the out-of-band context.
+
+.. _`out-of-band suspended state`:
+2. the real-time core suspends execution of the current task from its
+ own standpoint. The real-time scheduler is assumed to be using the
+ common :c:func:`switch_to` routine for switching task contexts.
+
+3. at some point later, the out-of-band context is exited by the
+ current CPU when no more high-priority work is left, causing the
+ preempted in-band kernel code to resume execution on the root
+ stage. The handler scheduled at step 1 eventually runs, waking up
+ the migrating task from the standpoint of the kernel.
+
+4. the migrating task resumes from the tail scheduling code of the
+ real-time scheduler, where it suspended in step 2. Noticing the
+ migration, the real-time core eventually calls
+ :c:func:`__ipipe_reenter_root` for finalizing the transition of the
+ incoming task to the root stage.
+
+Binding to the real-time core
+-----------------------------
+
+.. _binding:
+The I-pipe facilitates fine-grained per-thread management from the
+real-time core, as opposed to per-process. For this reason, the
+real-time core should at least implement a mechanism for turning a
+regular task into a real-time thread with extended capabilities,
+binding it to the core.
+
+The real-time core should inform the kernel about its intent to
+receive notifications about that task, by calling
+:c:func::`ipipe_enable_notifier` when such task is current.
+
+For this reason, the binding operation is usually carried out by a
+dedicated system call exposed by the real-time core, which a regular
+task would invoke.
+
+.. NOTE:: Whether there should be distinct procedures for binding
+ processes *and* threads to the real-time core, or only a
+ thread binding procedure is up to the real-time core
+ implementation.
+
+Notifications
+-------------
+
+Exception handling
+~~~~~~~~~~~~~~~~~~
+
+.. _exception
+If a processor exception is raised while the CPU is busy running a
+real-time thread in the out-of-band context (e.g. due to some invalid
+memory access, bad instruction, FPU or alignment error etc), the task
+may have to leave such context immediately if the fault handler is not
+protected against out-of-band interrupts, and therefore cannot be
+properly serialized with out-of-band code.
+
+The I-pipe notifies the real-time core about incoming exceptions early
+from the low-level fault handlers, but only when some out-of-band code
+was running when the exception was taken. The real-time core may then
+take action, such as reconciling the current task's execution context
+with the kernel's expectations before the task may traverse the
+regular fault handling code.
+
+.. HINT:: Enabling debuggers to trace real-time thread involves
+ dealing with debug traps the former may poke into the
+ debuggee's code for breakpointing duties.
+
+The notification is issued by a call to :c:func:`__ipipe_notify_trap`
+which in turn invokes the :c:func:`ipipe_trap_hook` routine the
+real-time core should override for receiving those events (*__weak*
+binding). Interrupts are **disabled** in the CPU when
+:c:func:`ipipe_trap_hook` is called.::
+
+ /* out-of-band code running */
+ *bad_pointer = 42;
+ [ACCESS EXCEPTION]
+ /* low-level fault handler in arch/<arch>/mm */
+ -> do_page_fault()
+ -> __ipipe_notify_trap(...)
+ /* real-time core */
+ -> ipipe_trap_hook(...)
+ -> forced task migration to root stage
+ ...
+ -> handle_mm_fault()
+
+.. NOTE:: handling minor memory access faults only requiring quick PTE
+ fixups should not involve switching the current task to the
+ in-band context though. Instead, the fixup code should be
+ made atomic_ for serializing accesses from any context.
+
+System calls
+~~~~~~~~~~~~
+
+A real-time core interfaced with the kernel via the I-pipe may
+introduce its own set of system calls. From the standpoint of the
+kernel, this is a foreign set of calls, which can be distinguished
+unambiguously from regular ones based on an arch-specific marker.
+
+.. HINT:: Syscall numbers from this set might have a different base,
+ and/or some high-order bit set which regular syscall numbers
+ would not have.
+
+If a task bound to the real-time core issues any system call,
+regardless of which of the kernel or real-time core should handle it,
+the latter must be given the opportunity to:
+
+- perform the service directly, possibly switching the caller to
+ out-of-band context first would the request require it.
+
+- pass the request downward to the normal system call path on the root
+ stage, possibly switching the caller to in-band context if needed.
+
+If a regular task (i.e. *not* known from the real-time core [yet])
+issues any foreign system call, the real-time core is given a chance
+to handle it. This way, a foreign system call which would initially
+bind a regular task to the real-time core would be delivered to the
+real-time core as expected (see binding_).
+
+The I-pipe intercepts system calls early in the kernel entry code,
+delivering them to the proper handler according to the following
+logic::
+
+ is_foreign(syscall_nr)?
+ Y: is_bound(task)
+ Y: -> ipipe_fastcall_hook()
+ N: -> ipipe_syscall_hook()
+ N: is_bound(task)
+ Y: -> ipipe_syscall_hook()
+ N: -> normal syscall handling
+
+:c:func:`ipipe_fastcall_hook` is the fast path for handling foreign
+system calls from tasks already running in out-of-band context.
+
+:c:func:`ipipe_syscall_hook` is a slower path for handling requests
+which might require the caller to switch to the out-of-band context
+first before proceeding.
+
+Kernel events
+~~~~~~~~~~~~~
+
+The last set of notifications involves pure kernel events which the
+real-time core may need to know about, as they may affect its own task
+management. Except for IPIPE_KEVT_CLEANUP which is called for *any*
+exiting user-space task, all other notifications are only issued for
+tasks bound to the real-time core (which may involve kthreads).
+
+The notification is issued by a call to :c:func:`__ipipe_notify_kevent`
+which in turn invokes the :c:func:`ipipe_kevent_hook` routine the
+real-time core should override for receiving those events (*__weak*
+binding). Interrupts are **enabled** in the CPU when
+:c:func:`ipipe_kevent_hook` is called.
+
+The notification hook is given the event type code, and a single
+pointer argument which relates to the event type.
+
+The following events are defined (include/linux/ipipe_domain.h):
+
+- IPIPE_KEVT_SCHEDULE(struct task_struct *next)
+
+ sent in preparation of a context switch, right before the memory
+ context is switched to *next*.
+
+- IPIPE_KEVT_SIGWAKE(struct task_struct *target)
+
+ sent when *target* is about to receive a signal. The real-time core
+ may decide to schedule a transition of the recipient to the root
+ stage in order to have it handle that signal asap, which is commonly
+ required for keeping the kernel sane. This notification is always
+ sent from the context of the issuer.
+
+- IPIPE_KEVT_SETAFFINITY(struct ipipe_migration_data *p)
+
+ sent when p->task is about to move to CPU p->dest_cpu.
+
+- IPIPE_KEVT_EXIT(struct task_struct *current)
+
+ sent from :c:func:`do_exit` before the current task has dropped the
+ files and mappings it owns.
+
+- IPIPE_KEVT_CLEANUP(struct mm_struct *mm)
+
+ sent before *mm* is entirely dropped, before the mappings are
+ exited. Per-process resources which might be maintained by the
+ real-time core could be released there, as all threads have exited.
+
+ ..NOTE:: IPIPE_KEVT_SETSCHED is deprecated, and should not be used.
+
+Prerequisites
+=============
+
+The interrupt pipeline requires the following features to be available
+from the target kernel:
+
+- Generic IRQ handling
+- Clock event abstraction
+
+Implementation
+==============
+
+The following kernel areas are involved in interrupt pipelining:
+
+- Generic IRQ core
+
+ * IRQ flow handlers
+
+ Generic flow handlers acknowledge the incoming IRQ event in the
+ hardware by calling the appropriate irqchip-specific
+ handler. However, the generic flow_ handlers do not immediately
+ invoke the in-band interrupt handlers, but leave this decision to
+ the pipeline core which calls them, according to the pipelined
+ delivery logic.
+
+- Arch-specific bits
+
+ * CPU interrupt mask handling
+
+ The architecture-specific code which manipulates the interrupt
+ flag in the CPU's state register
+ (i.e. arch/<arch>/include/asm/irqflags.h) is split between real
+ and virtual interrupt control:
+
+ + the *hard_local_irq* level helpers affect the hardware state in
+ the CPU.
+
+ + the *arch_* level helpers affect the virtual interrupt flag_
+ implemented by the pipeline core for controlling the root stage
+ protection against interrupts.
+
+ This means that generic helpers from <linux/irqflags.h> such as
+ :c:func:`local_irq_disable` and :c:func:`local_irq_enable`
+ actually refer to the virtual protection scheme when interrupts
+ are pipelined, implementing interrupt deferral_ for the protected
+ in-band code running over the root stage.
+
+ * Assembly-level IRQ, exception paths
+
+ Since interrupts are only virtually masked by the in-band code,
+ IRQs can still be taken by the CPU although they should not be
+ visible from the root stage when they happen in the following
+ situations:
+
+ + when the virtual protection flag_ is raised, meaning the root
+ stage does not accept IRQs, in which case interrupt _deferral
+ happens.
+
+ + when the CPU runs out-of-band code, regardless of the state of
+ the virtual protection flag.
+
+ In both cases, the low-level assembly code handling incoming IRQs
+ takes a fast exit path unwinding the interrupt frame early,
+ instead of running the common in-band epilogue which checks for
+ task rescheduling opportunities and pending signals.
+
+ Likewise, the low-level fault/exception handling code also takes a
+ fast exit path under the same circumstances. Typically, an
+ out-of-band handler causing a minor page fault should benefit from
+ a lightweight PTE fixup performed by the high-level fault handler,
+ but is not allowed to traverse the rescheduling logic upon return
+ from exception.
+
+- Scheduler core
+
+ * CPUIDLE support
+
+ The logic of the CPUIDLE framework has to account for those
+ specific issues the interrupt pipelining introduces:
+
+ - the kernel might be idle in the sense that no in-band activity
+ is scheduled yet, and planning to shut down the timer device
+ suffering the C3STOP (mis)feature. However, at the same time,
+ some out-of-band code might wait for a tick event already
+ programmed in the timer hardware controlled by some out-of-band
+ code via the timer_ interposition mechanism.
+
+ - switching the CPU to a power saving state may incur a
+ significant latency, particularly for waking it up before it can
+ handle an incoming IRQ, which is at odds with the purpose of
+ interrupt pipelining.
+
+ Obviously, we don't want the CPUIDLE logic to turn off the
+ hardware timer when C3STOP is in effect for the timer device,
+ which would cause the pending out-of-band event to be
+ lost.
+
+ Likewise, the wake up latency induced by entering a sleep state on
+ a particular hardware may not always be acceptable.
+
+ Since the in-band kernel code does not know about the out-of-band
+ code plans by design, CPUIDLE calls :c:func:`ipipe_cpuidle_control`
+ to figure out whether the out-of-band system is fine with entering
+ the idle state as well. This routine should be overriden by the
+ out-of-band code for receiving such notification (*__weak*
+ binding).
+
+ If this hook returns a boolean *true* value, CPUIDLE proceeds as
+ normally. Otherwise, the CPU is simply denied from entering the
+ idle state, leaving the timer hardware enabled.
+
+ ..CAUTION:: If some out-of-band code waiting for an external event
+ cannot bear with the latency that might be induced by the default
+ architecture-specific CPU idling code, then CPUIDLE is not usable
+ and should be disabled at build time.
+
+ * Kernel preemption control (PREEMPT)
+
+ :c:func:`__preempt_schedule_irq` reconciles the virtual interrupt
+ state - which has not been touched by the assembly level code upon
+ kernel entry - with basic assumptions made by the scheduler core,
+ such as entering with interrupts disabled. It should be called by
+ the arch-specific assembly code in replacement of
+ :c:func:`preempt_schedule_irq`, from the call site dealing with
+ kernel preemption upon return from IRQ or system call.
+
+- Timer management
+
+ * Timer interposition
+
+.. _timer:
+ The timer interposition mechanism is designed for handing over
+ control of the hardware tick device in use by the kernel to an
+ out-of-band timing logic. Typically, a real-time co-kernel would
+ make good use of this feature, for grabbing control over the timer
+ hardware.
+
+ Once some out-of-band logic has grabbed control over the timer
+ device by calling :c:func:`ipipe_select_timers`, it can install
+ its own out-of-band handlers using :c:func:`ipipe_timer_start`.
+ From that point, it must carry out the timing requests from the
+ in-band timer core (e.g. hrtimers) in addition to its own timing
+ duties.
+
+ In other words, once the interposition is set up, the
+ functionality of the tick device is shared between the in-band and
+ out-of-band contexts, with only the latter actually programming
+ the hardware.
+
+ This mechanism is based on the clock event abstraction (`struct
+ clock_event_device`). Clock event devices which may be controlled
+ by this way need their drivers to be specifically adapted for such
+ use:
+
+ + the interrupt handler receiving tick IRQs must be check with
+ :c:func:`clockevent_ipipe_stolen` whether they actually control
+ the hardware. A non-zero return from this routine means that it
+ does not, and therefore should skip the timer acknowledge
+ code, which would have run earlier in that case.
+
+- Generic locking & atomic
+
+ * Generic atomic ops
+
+.. _atomic:
+ The effect of virtualizing interrupt protection must be reversed
+ for atomic helpers in <asm-generic/{atomic|bitops/atomic}.h> and
+ <asm-generic/cmpxchg-local.h>, so that no interrupt can preempt
+ their execution, regardless of the stage their caller live
+ on.
+
+ This is required to keep those helpers usable on data which
+ might be accessed concurrently from both stages.
+
+ The usual way to revert such virtualization consists of delimiting
+ the protected section with :c:func:`hard_local_irq_save`,
+ :c:func:`hard_local_irq_restore` calls, in replacement for
+ :c:func:`local_irq_save`, :c:func:`local_irq_restore`
+ respectively.
+
+ * Hard spinlocks
+
+ The pipeline core introduces one more spinlock type:
+
+ + *hard* spinlocks manipulate the CPU interrupt mask, and don't
+ affect the kernel preemption state in locking/unlocking
+ operations.
+
+ This type of spinlock is useful for implementing a critical
+ section to serialize concurrent accesses from both in-band and
+ out-of-band contexts, i.e. from root and head stages. Obviously,
+ sleeping into a critical section protected by a hard spinlock
+ would be a very bad idea.
+
+ In other words, hard spinlocks are not subject to virtual
+ interrupt masking, therefore can be used to serialize with
+ out-of-band activities, including from the in-band kernel
+ code. At any rate, those sections ought to be quite short, for
+ keeping latency low.
+
+- Drivers
+
+ * IRQ chip drivers
+
+ .. _irqchip:
+ irqchip drivers need to be specifically adapted for supporting the
+ pipelined interrupt model. The irqchip descriptor gains additional
+ handlers:
+
+ + irq_chip.irq_hold is an optional handler called by the pipeline
+ core upon events from *level-triggered*, *fasteoi* and *percpu*
+ types. See Holding_ interrupt lines.
+
+ When specified in the descriptor, irq_chip.irq_hold should
+ perform as follows, depending on the hardware acknowledge logic:
+
+ + level -> mask[+ack]
+ + percpu -> mask[+ack][+eoi]
+ + fasteoi -> mask+eoi
+
+ .. CAUTION:: proper acknowledge and/or EOI is important when
+ holding a line, as those operations may also
+ decrease the current interrupt priority level for
+ the CPU, allowing same or lower priority
+ out-of-band interrupts to be taken while the
+ initial IRQ might be deferred_ for the root stage.
+
+ + irq_chip.irq_release is the converse operation to
+ irq_chip.irq_hold, releasing an interrupt line from the held
+ state.
+
+ The :c:func:`ipipe_end_irq` routine invokes the available
+ handler for releasing the interrupt line. The pipeline core
+ calls :c:func:`irq_release` automatically for each IRQ which has
+ been accepted by an in-band handler (`IRQ_HANDLED` status). This
+ routine should be called explicitly by out-of-band handlers
+ before returning to their caller.
+
+ `IRQCHIP_PIPELINE_SAFE` must be added to `struct irqchip::flags`
+ member of a pipeline-aware irqchip driver.
+
+ .. NOTE:: :c:func:`irq_set_chip` will complain loudly with a
+ kernel warning whenever the irqchip descriptor passed
+ does not bear the `IRQCHIP_PIPELINE_SAFE` flag and
+ CONFIG_IPIPE is enabled.
+
+- Misc
+
+ * :c:func:`printk`
+
+ :c:func:`printk` may be called by out-of-band code safely, without
+ encurring extra latency. The output is delayed until the in-band
+ code resumes, and the console driver(s) can handle it.
+
+ * Tracing core
+
+ Tracepoints can be traversed by out-of-band code safely. Dynamic
+ tracing is available to a kernel running the pipelined interrupt
+ model too.
+
+Terminology
+===========
+
+.. _terminology:
+====================== =======================================================
+ Term Definition
+====================== =======================================================
+Head stage high-priority execution context trigged by out-of-band IRQs
+Root stage regular kernel context performing GPOS work
+Out-of-band code code running over the head stage
+In-band code code running over the root stage
+Scheduler the regular, Linux kernel scheduler
+Real-time scheduler the out-of-band task scheduling logic implemented on top of the I-pipe
+
+Resources
+=========
+
+.. [#f1] Stodolsky, Chen & Bershad; "Fast Interrupt Priority Management in Operating System Kernels"
+ https://www.usenix.org/legacy/publications/library/proceedings/micro93/full_papers/stodolsky.txt
+.. [#f2] Yaghmour, Karim; "ADEOS - Adaptive Domain Environment for Operating Systems"
+ https://www.opersys.com/ftp/pub/Adeos/adeos.pdf
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6002252692af..52719931e087 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -131,8 +131,8 @@ config X86
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
- select HAVE_ARCH_JUMP_LABEL
- select HAVE_ARCH_JUMP_LABEL_RELATIVE
+ select HAVE_ARCH_JUMP_LABEL if !IPIPE
+ select HAVE_ARCH_JUMP_LABEL_RELATIVE if !IPIPE
select HAVE_ARCH_KASAN if X86_64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS if MMU
@@ -150,7 +150,7 @@ config X86
select HAVE_ASM_MODVERSIONS
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
- select HAVE_CONTEXT_TRACKING if X86_64
+ select HAVE_CONTEXT_TRACKING if X86_64 && !IPIPE
select HAVE_COPY_THREAD_TLS
select HAVE_C_RECORDMCOUNT
select HAVE_DEBUG_KMEMLEAK
@@ -172,6 +172,12 @@ config X86
select HAVE_IOREMAP_PROT
select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
select HAVE_IRQ_TIME_ACCOUNTING
+ select HAVE_IPIPE_SUPPORT if X86_64
+ select HAVE_IPIPE_TRACER_SUPPORT
+ select IPIPE_HAVE_HOSTRT if IPIPE
+ select IPIPE_HAVE_SAFE_THREAD_INFO if IPIPE
+ select IPIPE_WANT_PTE_PINNING if IPIPE
+ select IPIPE_HAVE_VM_NOTIFIER if IPIPE
select HAVE_KERNEL_BZIP2
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_LZ4
@@ -552,6 +558,7 @@ config X86_UV
depends on KEXEC_CORE
depends on X86_X2APIC
depends on PCI
+ depends on !IPIPE
---help---
This option is needed in order to support SGI Ultraviolet systems.
If you don't have one of these, you should say N here.
@@ -758,6 +765,7 @@ if HYPERVISOR_GUEST
config PARAVIRT
bool "Enable paravirtualization code"
+ depends on !IPIPE
---help---
This changes the kernel so it can modify itself when it is run
under a hypervisor, potentially improving performance significantly
@@ -964,7 +972,7 @@ config CALGARY_IOMMU_ENABLED_BY_DEFAULT
config MAXSMP
bool "Enable Maximum number of SMP Processors and NUMA Nodes"
- depends on X86_64 && SMP && DEBUG_KERNEL
+ depends on X86_64 && SMP && DEBUG_KERNEL && !IPIPE
select CPUMASK_OFFSTACK
---help---
Enable maximum number of CPUS and NUMA Nodes for this architecture.
@@ -1064,6 +1072,8 @@ config SCHED_MC_PRIO
If unsure say Y here.
+source "kernel/ipipe/Kconfig"
+
config UP_LATE_INIT
def_bool y
depends on !SMP && X86_LOCAL_APIC
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3f8e22615812..e8a2c167afb8 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -17,6 +17,7 @@
#include <linux/tracehook.h>
#include <linux/audit.h>
#include <linux/seccomp.h>
+#include <linux/unistd.h>
#include <linux/signal.h>
#include <linux/export.h>
#include <linux/context_tracking.h>
@@ -48,6 +49,22 @@ __visible inline void enter_from_user_mode(void)
static inline void enter_from_user_mode(void) {}
#endif
+#ifdef CONFIG_IPIPE
+#define disable_local_irqs() do { \
+ hard_local_irq_disable(); \
+ trace_hardirqs_off(); \
+} while (0)
+#define enable_local_irqs() do { \
+ trace_hardirqs_on(); \
+ hard_local_irq_enable(); \
+} while (0)
+#define check_irqs_disabled() hard_irqs_disabled()
+#else
+#define disable_local_irqs() local_irq_disable()
+#define enable_local_irqs() local_irq_enable()
+#define check_irqs_disabled() irqs_disabled()
+#endif
+
static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
{
#ifdef CONFIG_X86_64
@@ -143,7 +160,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
*/
while (true) {
/* We have work to do. */
- local_irq_enable();
+ enable_local_irqs();
if (cached_flags & _TIF_NEED_RESCHED)
schedule();
@@ -168,7 +185,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
fire_user_return_notifiers();
/* Disable IRQs and retry */
- local_irq_disable();
+ disable_local_irqs();
cached_flags = READ_ONCE(current_thread_info()->flags);
@@ -188,11 +205,23 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
lockdep_assert_irqs_disabled();
lockdep_sys_exit();
+again:
cached_flags = READ_ONCE(ti->flags);
if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
exit_to_usermode_loop(regs, cached_flags);
+ if (ipipe_user_intret_notifier_enabled(ti)) {
+ int ret;
+
+ enable_local_irqs();
+ ret = __ipipe_notify_user_intreturn();
+ disable_local_irqs();
+
+ if (ret == 0)
+ goto again;
+ }
+
/* Reload ti->flags; we may have rescheduled above. */
cached_flags = READ_ONCE(ti->flags);
@@ -250,7 +279,7 @@ static void syscall_slow_exit_work(struct pt_regs *regs, u32 cached_flags)
* Called with IRQs on and fully valid regs. Returns with IRQs off in a
* state such that we can immediately switch to user mode.
*/
-__visible inline void syscall_return_slowpath(struct pt_regs *regs)
+static void __syscall_return_slowpath(struct pt_regs *regs, bool do_work)
{
struct thread_info *ti = current_thread_info();
u32 cached_flags = READ_ONCE(ti->flags);
@@ -258,8 +287,8 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
if (IS_ENABLED(CONFIG_PROVE_LOCKING) &&
- WARN(irqs_disabled(), "syscall %ld left IRQs disabled", regs->orig_ax))
- local_irq_enable();
+ WARN(check_irqs_disabled(), "syscall %ld left IRQs disabled", regs->orig_ax))
+ enable_local_irqs();
rseq_syscall(regs);
@@ -267,21 +296,39 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
* First do one-time work. If these work items are enabled, we
* want to run them exactly once per syscall exit with IRQs on.
*/
- if (unlikely(cached_flags & SYSCALL_EXIT_WORK_FLAGS))
+ if (unlikely(do_work && (cached_flags & SYSCALL_EXIT_WORK_FLAGS)))
syscall_slow_exit_work(regs, cached_flags);
- local_irq_disable();
+ disable_local_irqs();
prepare_exit_to_usermode(regs);
}
+__visible inline void syscall_return_slowpath(struct pt_regs *regs)
+{
+ __syscall_return_slowpath(regs, true);
+}
+
#ifdef CONFIG_X86_64
__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
{
struct thread_info *ti;
+ int __maybe_unused ret;
enter_from_user_mode();
- local_irq_enable();
+ enable_local_irqs();
ti = current_thread_info();
+
+ #define __SYSCALL_MASK (~0)
+ ret = ipipe_handle_syscall(ti, nr & __SYSCALL_MASK, regs);
+ if (ret > 0) {
+ disable_local_irqs();
+ return;
+ }
+ if (ret < 0) {
+ __syscall_return_slowpath(regs, false);
+ return;
+ }
+
if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY)
nr = syscall_trace_enter(regs);
@@ -302,6 +349,39 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
#endif
#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+
+#ifdef CONFIG_IPIPE
+#ifdef CONFIG_X86_32
+static inline int pipeline_syscall(struct thread_info *ti,
+ unsigned long nr, struct pt_regs *regs)
+{
+ return ipipe_handle_syscall(ti, nr, regs);
+}
+#else
+static inline int pipeline_syscall(struct thread_info *ti,
+ unsigned long nr, struct pt_regs *regs)
+{
+ struct pt_regs regs64 = *regs;
+ int ret;
+
+ regs64.di = (unsigned int)regs->bx;
+ regs64.si = (unsigned int)regs->cx;
+ regs64.r10 = (unsigned int)regs->si;
+ regs64.r8 = (unsigned int)regs->di;
+ regs64.r9 = (unsigned int)regs->bp;
+ ret = ipipe_handle_syscall(ti, nr, ®s64);
+ regs->ax = (unsigned int)regs64.ax;
+
+ return ret;
+}
+#endif /* CONFIG_X86_32 */
+#else /* CONFIG_IPIPE */
+static inline int pipeline_syscall(struct thread_info *ti,
+ unsigned long nr, struct pt_regs *regs)
+{
+ return 0;
+}
+#endif /* CONFIG_IPIPE */
/*
* Does a 32-bit syscall. Called with IRQs on in CONTEXT_KERNEL. Does
* all entry and exit work and returns with IRQs off. This function is
@@ -312,11 +392,22 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
{
struct thread_info *ti = current_thread_info();
unsigned int nr = (unsigned int)regs->orig_ax;
+ int ret;
#ifdef CONFIG_IA32_EMULATION
ti->status |= TS_COMPAT;
#endif
+ ret = pipeline_syscall(ti, nr, regs);
+ if (ret > 0) {
+ disable_local_irqs();
+ return;
+ }
+ if (ret < 0) {
+ __syscall_return_slowpath(regs, false);
+ return;
+ }
+
if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
/*
* Subtlety here: if ptrace pokes something larger than
@@ -352,7 +443,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
__visible void do_int80_syscall_32(struct pt_regs *regs)
{
enter_from_user_mode();
- local_irq_enable();
+ enable_local_irqs();
do_syscall_32_irqs_on(regs);
}
@@ -376,7 +467,7 @@ __visible long do_fast_syscall_32(struct pt_regs *regs)
enter_from_user_mode();
- local_irq_enable();
+ enable_local_irqs();
/* Fetch EBP from where the vDSO stashed it. */
if (
@@ -394,7 +485,7 @@ __visible long do_fast_syscall_32(struct pt_regs *regs)
) {
/* User code screwed up. */
- local_irq_disable();
+ disable_local_irqs();
regs->ax = -EFAULT;
prepare_exit_to_usermode(regs);
return 0; /* Keep it simple: use IRET. */
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index bd7a4ad0937c..6b22645a5606 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -30,6 +30,7 @@
#include <asm/hw_irq.h>
#include <asm/page_types.h>
#include <asm/irqflags.h>
+#include <asm/ipipe_base.h>
#include <asm/paravirt.h>
#include <asm/percpu.h>
#include <asm/asm.h>
@@ -63,7 +64,12 @@ END(native_usergs_sysret64)
.endm
.macro TRACE_IRQS_IRETQ
- TRACE_IRQS_FLAGS EFLAGS(%rsp)
+#ifdef CONFIG_TRACE_IRQFLAGS
+ btl $9, EFLAGS(%rsp) /* interrupts off? */
+ jnc 1f
+ TRACE_IRQS_ON_VIRT
+1:
+#endif
.endm
/*
@@ -77,7 +83,8 @@ END(native_usergs_sysret64)
* make sure the stack pointer does not get reset back to the top
* of the debug stack, and instead just reuses the current stack.
*/
-#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_TRACE_IRQFLAGS)
+#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_TRACE_IRQFLAGS) \
+ && !defined(CONFIG_IPIPE)
.macro TRACE_IRQS_OFF_DEBUG
call debug_stack_set_zero
@@ -334,6 +341,7 @@ END(__switch_to_asm)
*/
ENTRY(ret_from_fork)
UNWIND_HINT_EMPTY
+ HARD_COND_ENABLE_INTERRUPTS
movq %rax, %rdi
call schedule_tail /* rdi: 'prev' task parameter */
@@ -577,8 +585,13 @@ ENTRY(interrupt_entry)
1:
ENTER_IRQ_STACK old_rsp=%rdi save_ret=1
- /* We entered an interrupt context - irqs are off: */
+#ifndef CONFIG_IPIPE
+ /* We entered an interrupt context - irqs are off unless
+ pipelining is enabled, in which case we defer tracing until
+ __ipipe_do_sync_stage() where the virtual IRQ state is
+ updated for the root stage. */
TRACE_IRQS_OFF
+#endif
ret
END(interrupt_entry)
@@ -606,7 +619,17 @@ common_interrupt:
addq $-0x80, (%rsp) /* Adjust vector to [-256, -1] range */
call interrupt_entry
UNWIND_HINT_REGS indirect=1
+#ifdef CONFIG_IPIPE
+ call __ipipe_handle_irq
+ testl %eax, %eax
+ jnz ret_from_intr
+ LEAVE_IRQ_STACK
+ testb $3, CS(%rsp)
+ jz retint_kernel_early
+ jmp retint_user_early
+#else
call do_IRQ /* rdi points to pt_regs */
+#endif
/* 0(%rsp): old RSP */
ret_from_intr:
DISABLE_INTERRUPTS(CLBR_ANY)
@@ -621,6 +644,7 @@ ret_from_intr:
GLOBAL(retint_user)
mov %rsp,%rdi
call prepare_exit_to_usermode
+retint_user_early:
TRACE_IRQS_IRETQ
GLOBAL(swapgs_restore_regs_and_return_to_usermode)
@@ -675,12 +699,17 @@ retint_kernel:
jnc 1f
cmpl $0, PER_CPU_VAR(__preempt_count)
jnz 1f
+#ifdef CONFIG_IPIPE
+ call __ipipe_preempt_schedule_irq
+#else
call preempt_schedule_irq
+#endif
1:
#endif
/*
* The iretq could re-enable interrupts:
*/
+ retint_kernel_early:
TRACE_IRQS_IRETQ
GLOBAL(restore_regs_and_return_to_kernel)
@@ -799,6 +828,28 @@ _ASM_NOKPROBE(common_interrupt)
/*
* APIC interrupts.
*/
+#ifdef CONFIG_IPIPE
+.macro apicinterrupt2 num sym
+ENTRY(\sym)
+ UNWIND_HINT_IRET_REGS
+ ASM_CLAC
+ pushq $~(\num)
+.Lcommon_\sym:
+ call interrupt_entry
+ UNWIND_HINT_REGS indirect=1
+ call __ipipe_handle_irq
+ testl %eax, %eax
+ jnz ret_from_intr
+ LEAVE_IRQ_STACK
+ testb $3, CS(%rsp)
+ jz retint_kernel_early
+ jmp retint_user_early
+END(\sym)
+.endm
+.macro apicinterrupt3 num sym do_sym
+apicinterrupt2 \num \sym
+.endm
+#else /* !CONFIG_IPIPE */
.macro apicinterrupt3 num sym do_sym
ENTRY(\sym)
UNWIND_HINT_IRET_REGS
@@ -811,6 +862,7 @@ ENTRY(\sym)
END(\sym)
_ASM_NOKPROBE(\sym)
.endm
+#endif /* !CONFIG_IPIPE */
/* Make sure APIC interrupt handlers end up in the irqentry section: */
#define PUSH_SECTION_IRQENTRY .pushsection .irqentry.text, "ax"
@@ -856,6 +908,14 @@ apicinterrupt THERMAL_APIC_VECTOR thermal_interrupt smp_thermal_interrupt
apicinterrupt CALL_FUNCTION_SINGLE_VECTOR call_function_single_interrupt smp_call_function_single_interrupt
apicinterrupt CALL_FUNCTION_VECTOR call_function_interrupt smp_call_function_interrupt
apicinterrupt RESCHEDULE_VECTOR reschedule_interrupt smp_reschedule_interrupt
+#ifdef CONFIG_IPIPE
+apicinterrupt2 IPIPE_RESCHEDULE_VECTOR ipipe_reschedule_interrupt
+apicinterrupt2 IPIPE_CRITICAL_VECTOR ipipe_critical_interrupt
+#endif
+#endif
+
+#ifdef CONFIG_IPIPE
+apicinterrupt2 IPIPE_HRTIMER_VECTOR ipipe_hrtimer_interrupt
#endif
apicinterrupt ERROR_APIC_VECTOR error_interrupt smp_error_interrupt
@@ -870,8 +930,51 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt
*/
#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + (x) * 8)
-.macro idtentry_part do_sym, has_error_code:req, read_cr2:req, paranoid:req, shift_ist=-1, ist_offset=0
+/*
+ * occupy r13,r14,r15. r12 used for prevent clobbering of saved CR2 value.
+ */
+.macro ipipe_idtentry_prologue paranoid=0 trapnr=-1 skip_label=-invalid-
+#ifdef CONFIG_IPIPE
+ movq EFLAGS(%rsp), %r14 /* regs->flags */
+ movq %rsp, %rdi /* pt_regs pointer */
+ movl $\trapnr, %esi /* trap number */
+ subq $8, %rsp
+ movq %rsp, %rdx /* &flags */
+ call __ipipe_trap_prologue
+ popq %r13
+ mov %rax, %r15 /* save propagation status */
+ .if \paranoid == 0 /* paranoid may not skip handler */
+ testl %eax, %eax
+ jg \skip_label /* skip regular handler if > 0 */
+ .endif
+#endif
+.endm
+
+.macro ipipe_idtentry_epilogue paranoid=0 skip_label=-invalid-
+#ifdef CONFIG_IPIPE
+ testl %r15d, %r15d
+ jnz 1000f
+ movq %rsp, %rdi /* pt_regs pointer */
+ movq %r13, %rsi /* &flags from prologue */
+ movq %r14, %rdx /* original regs->flags before fixup */
+ call __ipipe_trap_epilogue
+1000:
+ .if \paranoid == 0 /* paranoid implies normal epilogue */
+ testl %r15d, %r15d
+ jz 1001f
+\skip_label:
+ UNWIND_HINT_REGS
+ DISABLE_INTERRUPTS(CLBR_ANY)
+ testb $3, CS(%rsp)
+ jz retint_kernel_early
+ jmp retint_user_early
+ .endif
+1001:
+#endif
+.endm
+
+.macro idtentry_part do_sym, has_error_code:req, read_cr2:req, trapnr:req, paranoid:req, shift_ist=-1, ist_offset=0
.if \paranoid
call paranoid_entry
/* returned flag: ebx=0: need swapgs on exit, ebx=1: don't need it */
@@ -902,6 +1005,8 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt
.Lfrom_kernel_no_context_tracking_\@:
.endif
+ ipipe_idtentry_prologue paranoid=\paranoid trapnr=\trapnr skip_label=kernel_skip_\@
+
movq %rsp, %rdi /* pt_regs pointer */
.if \has_error_code
@@ -921,6 +1026,8 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt
call \do_sym
+ ipipe_idtentry_epilogue paranoid=\paranoid skip_label=kernel_skip_\@
+
.if \shift_ist != -1
addq $\ist_offset, CPU_TSS_IST(\shift_ist)
.endif
@@ -972,7 +1079,7 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt
* @paranoid == 2 is special: the stub will never switch stacks. This is for
* #DF: if the thread stack is somehow unusable, we'll still get a useful OOPS.
*/
-.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0 create_gap=0 read_cr2=0
+.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0 create_gap=0 read_cr2=0 trapnr=-1
ENTRY(\sym)
UNWIND_HINT_IRET_REGS offset=\has_error_code*8
@@ -1010,7 +1117,7 @@ ENTRY(\sym)
.Lfrom_usermode_no_gap_\@:
.endif
- idtentry_part \do_sym, \has_error_code, \read_cr2, \paranoid, \shift_ist, \ist_offset
+ idtentry_part \do_sym, \has_error_code, \read_cr2, \trapnr, \paranoid, \shift_ist, \ist_offset
.if \paranoid == 1
/*
@@ -1019,26 +1126,26 @@ ENTRY(\sym)
* run in real process context if user_mode(regs).
*/
.Lfrom_usermode_switch_stack_\@:
- idtentry_part \do_sym, \has_error_code, \read_cr2, paranoid=0
+ idtentry_part \do_sym, \has_error_code, \read_cr2, \trapnr, paranoid=0
.endif
_ASM_NOKPROBE(\sym)
END(\sym)
.endm
-idtentry divide_error do_divide_error has_error_code=0
-idtentry overflow do_overflow has_error_code=0
-idtentry bounds do_bounds has_error_code=0
-idtentry invalid_op do_invalid_op has_error_code=0
-idtentry device_not_available do_device_not_available has_error_code=0
-idtentry double_fault do_double_fault has_error_code=1 paranoid=2 read_cr2=1
-idtentry coprocessor_segment_overrun do_coprocessor_segment_overrun has_error_code=0
-idtentry invalid_TSS do_invalid_TSS has_error_code=1
-idtentry segment_not_present do_segment_not_present has_error_code=1
-idtentry spurious_interrupt_bug do_spurious_interrupt_bug has_error_code=0
-idtentry coprocessor_error do_coprocessor_error has_error_code=0
-idtentry alignment_check do_alignment_check has_error_code=1
-idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0
+idtentry divide_error do_divide_error has_error_code=0 trapnr=0
+idtentry overflow do_overflow has_error_code=0 trapnr=4
+idtentry bounds do_bounds has_error_code=0 trapnr=5
+idtentry invalid_op do_invalid_op has_error_code=0 trapnr=6
+idtentry device_not_available do_device_not_available has_error_code=0 trapnr=7
+idtentry double_fault do_double_fault has_error_code=1 paranoid=2 read_cr2=1 trapnr=8
+idtentry coprocessor_segment_overrun do_coprocessor_segment_overrun has_error_code=0 trapnr=9
+idtentry invalid_TSS do_invalid_TSS has_error_code=1 trapnr=10
+idtentry segment_not_present do_segment_not_present has_error_code=1 trapnr=11
+idtentry spurious_interrupt_bug do_spurious_interrupt_bug has_error_code=0 trapnr=15
+idtentry coprocessor_error do_coprocessor_error has_error_code=0 trapnr=16
+idtentry alignment_check do_alignment_check has_error_code=1 trapnr=17
+idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0 trapnr=19
/*
@@ -1082,10 +1189,14 @@ EXPORT_SYMBOL(native_load_gs_index)
ENTRY(do_softirq_own_stack)
pushq %rbp
mov %rsp, %rbp
+ HARD_COND_DISABLE_INTERRUPTS
ENTER_IRQ_STACK regs=0 old_rsp=%r11
+ HARD_COND_ENABLE_INTERRUPTS
call __do_softirq
+ HARD_COND_DISABLE_INTERRUPTS
LEAVE_IRQ_STACK regs=0
leaveq
+ HARD_COND_ENABLE_INTERRUPTS
ret
ENDPROC(do_softirq_own_stack)
@@ -1193,24 +1304,28 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
acrn_hv_callback_vector acrn_hv_vector_handler
#endif
+#ifdef CONFIG_IPIPE
+idtentry debug do_debug has_error_code=0 paranoid=1 trapnr=1
+#else
idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
-idtentry int3 do_int3 has_error_code=0 create_gap=1
-idtentry stack_segment do_stack_segment has_error_code=1
+#endif
+idtentry int3 do_int3 has_error_code=0 create_gap=1 trapnr=3
+idtentry stack_segment do_stack_segment has_error_code=1 trapnr=12
#ifdef CONFIG_XEN_PV
idtentry xennmi do_nmi has_error_code=0
idtentry xendebug do_debug has_error_code=0
#endif
-idtentry general_protection do_general_protection has_error_code=1
-idtentry page_fault do_page_fault has_error_code=1 read_cr2=1
+idtentry general_protection do_general_protection has_error_code=1 trapnr=13
+idtentry page_fault do_page_fault has_error_code=1 read_cr2=1 trapnr=14
#ifdef CONFIG_KVM_GUEST
-idtentry async_page_fault do_async_page_fault has_error_code=1 read_cr2=1
+idtentry async_page_fault do_async_page_fault has_error_code=1 read_cr2=1 trapnr=14
#endif
#ifdef CONFIG_X86_MCE
-idtentry machine_check do_mce has_error_code=0 paranoid=1
+idtentry machine_check do_mce has_error_code=0 paranoid=1 trapnr=18
#endif
/*
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index ea5c4167086c..e48c14d34d73 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -39,6 +39,7 @@
#ifdef CONFIG_TRACE_IRQFLAGS
THUNK trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1
+ THUNK trace_hardirqs_on_virt_thunk,trace_hardirqs_on_virt_caller,1
THUNK trace_hardirqs_off_thunk,trace_hardirqs_off_caller,1
#endif
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index a49b1aeb2147..949f0c120ce1 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -437,7 +437,17 @@ static inline void apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v)) {}
extern void apic_ack_irq(struct irq_data *data);
+#ifdef CONFIG_IPIPE
+#ifdef CONFIG_SMP
+struct irq_data;
+void move_xxapic_irq(struct irq_data *data);
+#endif
+#define ack_APIC_irq() do { } while(0)
+static inline void __ack_APIC_irq(void)
+#else /* !CONFIG_IPIPE */
+#define __ack_APIC_irq() ack_APIC_irq()
static inline void ack_APIC_irq(void)
+#endif /* CONFIG_IPIPE */
{
/*
* ack_APIC_irq() actually gets compiled as a single instruction
diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
index 1a8609a15856..3baafeb2dfdc 100644
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -94,7 +94,7 @@ extern void aout_dump_debugregs(struct user *dump);
extern void hw_breakpoint_restore(void);
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_IPIPE)
DECLARE_PER_CPU(int, debug_stack_usage);
static inline void debug_stack_usage_inc(void)
{
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 68a99d2a5f33..6ffdc05ca930 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -309,7 +309,7 @@ static inline void force_reload_TR(void)
*/
static inline void refresh_tss_limit(void)
{
- DEBUG_LOCKS_WARN_ON(preemptible());
+ DEBUG_LOCKS_WARN_ON(!hard_irqs_disabled() && preemptible());
if (unlikely(this_cpu_read(__tss_limit_invalid)))
force_reload_TR();
@@ -326,7 +326,7 @@ static inline void refresh_tss_limit(void)
*/
static inline void invalidate_tss_limit(void)
{
- DEBUG_LOCKS_WARN_ON(preemptible());
+ DEBUG_LOCKS_WARN_ON(!hard_irqs_disabled() && preemptible());
if (unlikely(test_thread_flag(TIF_IO_BITMAP)))
force_reload_TR();
@@ -391,7 +391,7 @@ void alloc_intr_gate(unsigned int n, const void *addr);
extern unsigned long system_vectors[];
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_IPIPE)
DECLARE_PER_CPU(u32, debug_idt_ctr);
static inline bool is_debug_idt_enabled(void)
{
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 06e767bca0c1..65ec50ff7074 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -11,6 +11,7 @@
#ifndef _ASM_X86_FPU_API_H
#define _ASM_X86_FPU_API_H
#include <linux/bottom_half.h>
+#include <linux/irqflags.h>
/*
* Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
@@ -41,16 +42,25 @@ static inline void kernel_fpu_begin(void)
* fpu->state and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in
* a random state.
*/
-static inline void fpregs_lock(void)
+static inline unsigned long fpregs_lock(void)
{
+#ifdef CONFIG_IPIPE
+ return hard_local_irq_save();
+#else
preempt_disable();
local_bh_disable();
+ return 0;
+#endif
}
-static inline void fpregs_unlock(void)
+static inline void fpregs_unlock(unsigned long flags)
{
+#ifdef CONFIG_IPIPE
+ hard_local_irq_restore(flags);
+#else
local_bh_enable();
preempt_enable();
+#endif
}
#ifdef CONFIG_X86_DEBUG_FPU
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 5ed702e2c55f..e4f6e9bbb984 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -25,6 +25,7 @@
/*
* High level FPU state handling functions:
*/
+extern void fpu__initialize(struct fpu *fpu);
extern void fpu__prepare_read(struct fpu *fpu);
extern void fpu__prepare_write(struct fpu *fpu);
extern void fpu__save(struct fpu *fpu);
@@ -648,4 +649,24 @@ static inline void xsetbv(u32 index, u64 value)
: : "a" (eax), "d" (edx), "c" (index));
}
+DECLARE_PER_CPU(bool, in_kernel_fpu);
+
+static inline void kernel_fpu_disable(void)
+{
+ WARN_ON_FPU(this_cpu_read(in_kernel_fpu));
+ this_cpu_write(in_kernel_fpu, true);
+}
+
+static inline void kernel_fpu_enable(void)
+{
+ WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
+ this_cpu_write(in_kernel_fpu, false);
+}
+
+static inline bool kernel_fpu_disabled(void)
+{
+ return this_cpu_read(in_kernel_fpu);
+}
+
+
#endif /* _ASM_X86_FPU_INTERNAL_H */
diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
index 89789e8c80f6..dc376e53aa2a 100644
--- a/arch/x86/include/asm/i8259.h
+++ b/arch/x86/include/asm/i8259.h
@@ -26,7 +26,7 @@ extern unsigned int cached_irq_mask;
#define SLAVE_ICW4_DEFAULT 0x01
#define PIC_ICW4_AEOI 2
-extern raw_spinlock_t i8259A_lock;
+IPIPE_DECLARE_RAW_SPINLOCK(i8259A_lock);
/* the PIC may need a careful delay on some platforms, hence specific calls */
static inline unsigned char inb_pic(unsigned int port)
diff --git a/arch/x86/include/asm/ipipe.h b/arch/x86/include/asm/ipipe.h
new file mode 100644
index 000000000000..086208081461
--- /dev/null
+++ b/arch/x86/include/asm/ipipe.h
@@ -0,0 +1,70 @@
+/* -*- linux-c -*-
+ * arch/x86/include/asm/ipipe.h
+ *
+ * Copyright (C) 2007 Philippe Gerum.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __X86_IPIPE_H
+#define __X86_IPIPE_H
+
+#ifdef CONFIG_IPIPE
+
+#define IPIPE_CORE_RELEASE 13
+
+struct ipipe_domain;
+
+struct ipipe_arch_sysinfo {
+};
+
+#define ipipe_processor_id() raw_smp_processor_id()
+
+/* Private interface -- Internal use only */
+
+#define __ipipe_early_core_setup() do { } while(0)
+
+#define __ipipe_enable_irq(irq) irq_to_desc(irq)->chip->enable(irq)
+#define __ipipe_disable_irq(irq) irq_to_desc(irq)->chip->disable(irq)
+
+#ifdef CONFIG_SMP
+void __ipipe_hook_critical_ipi(struct ipipe_domain *ipd);
+#else
+#define __ipipe_hook_critical_ipi(ipd) do { } while(0)
+#endif
+
+void __ipipe_enable_pipeline(void);
+
+#define __ipipe_root_tick_p(regs) ((regs)->flags & X86_EFLAGS_IF)
+
+#define ipipe_notify_root_preemption() __ipipe_notify_vm_preemption()
+
+#endif /* CONFIG_IPIPE */
+
+#if defined(CONFIG_SMP) && defined(CONFIG_IPIPE)
+#define __ipipe_move_root_irq(__desc) \
+ do { \
+ if (!IS_ERR_OR_NULL(__desc)) { \
+ struct irq_chip *__chip = irq_desc_get_chip(__desc); \
+ if (__chip->irq_move) \
+ __chip->irq_move(irq_desc_get_irq_data(__desc)); \
+ } \
+ } while (0)
+#else /* !(CONFIG_SMP && CONFIG_IPIPE) */
+#define __ipipe_move_root_irq(irq) do { } while (0)
+#endif /* !(CONFIG_SMP && CONFIG_IPIPE) */
+
+#endif /* !__X86_IPIPE_H */
diff --git a/arch/x86/include/asm/ipipe_base.h b/arch/x86/include/asm/ipipe_base.h
new file mode 100644
index 000000000000..ea2876aa62b9
--- /dev/null
+++ b/arch/x86/include/asm/ipipe_base.h
@@ -0,0 +1,156 @@
+/* -*- linux-c -*-
+ * arch/x86/include/asm/ipipe_base.h
+ *
+ * Copyright (C) 2007-2012 Philippe Gerum.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __X86_IPIPE_BASE_H
+#define __X86_IPIPE_BASE_H
+
+#include <asm/irq_vectors.h>
+#include <asm/bitsperlong.h>
+
+#ifdef CONFIG_X86_32
+/* 32 from IDT + iret_error + mayday trap */
+#define IPIPE_TRAP_MAYDAY 33 /* Internal recovery trap */
+#define IPIPE_NR_FAULTS 34
+#else
+/* 32 from IDT + mayday trap */
+#define IPIPE_TRAP_MAYDAY 32 /* Internal recovery trap */
+#define IPIPE_NR_FAULTS 33
+#endif
+
+#ifdef CONFIG_X86_LOCAL_APIC
+/*
+ * Special APIC interrupts are mapped above the last defined external
+ * IRQ number.
+ */
+#define nr_apic_vectors (NR_VECTORS - FIRST_SYSTEM_VECTOR)
+#define IPIPE_FIRST_APIC_IRQ NR_IRQS
+#define IPIPE_HRTIMER_IPI ipipe_apic_vector_irq(IPIPE_HRTIMER_VECTOR)
+#ifdef CONFIG_SMP
+#define IPIPE_RESCHEDULE_IPI ipipe_apic_vector_irq(IPIPE_RESCHEDULE_VECTOR)
+#define IPIPE_CRITICAL_IPI ipipe_apic_vector_irq(IPIPE_CRITICAL_VECTOR)
+#endif /* CONFIG_SMP */
+#define IPIPE_NR_XIRQS (NR_IRQS + nr_apic_vectors)
+#define ipipe_apic_irq_vector(irq) ((irq) - IPIPE_FIRST_APIC_IRQ + FIRST_SYSTEM_VECTOR)
+#define ipipe_apic_vector_irq(vec) ((vec) - FIRST_SYSTEM_VECTOR + IPIPE_FIRST_APIC_IRQ)
+#else
+#define IPIPE_NR_XIRQS NR_IRQS
+#endif /* !CONFIG_X86_LOCAL_APIC */
+
+#ifndef __ASSEMBLY__
+
+#include <asm/apicdef.h>
+
+extern unsigned int tsc_khz;
+
+static inline const char *ipipe_clock_name(void)
+{
+ return "tsc";
+}
+
+#define __ipipe_cpu_freq ({ u64 __freq = 1000ULL * tsc_khz; __freq; })
+#define __ipipe_hrclock_freq __ipipe_cpu_freq
+
+#ifdef CONFIG_X86_32
+
+#define ipipe_read_tsc(t) \
+ __asm__ __volatile__("rdtsc" : "=A"(t))
+
+#define ipipe_tsc2ns(t) \
+({ \
+ unsigned long long delta = (t) * 1000000ULL; \
+ unsigned long long freq = __ipipe_hrclock_freq; \
+ do_div(freq, 1000); \
+ do_div(delta, (unsigned)freq + 1); \
+ (unsigned long)delta; \
+})
+
+#define ipipe_tsc2us(t) \
+({ \
+ unsigned long long delta = (t) * 1000ULL; \
+ unsigned long long freq = __ipipe_hrclock_freq; \
+ do_div(freq, 1000); \
+ do_div(delta, (unsigned)freq + 1); \
+ (unsigned long)delta; \
+})
+
+static inline unsigned long __ipipe_ffnz(unsigned long ul)
+{
+ __asm__("bsrl %1, %0":"=r"(ul) : "r"(ul));
+ return ul;
+}
+
+#else /* X86_64 */
+
+#define ipipe_read_tsc(t) do { \
+ unsigned int __a,__d; \
+ asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); \
+ (t) = ((unsigned long)__a) | (((unsigned long)__d)<<32); \
+} while(0)
+
+#define ipipe_tsc2ns(t) (((t) * 1000UL) / (__ipipe_hrclock_freq / 1000000UL))
+#define ipipe_tsc2us(t) ((t) / (__ipipe_hrclock_freq / 1000000UL))
+
+static inline unsigned long __ipipe_ffnz(unsigned long ul)
+{
+ __asm__("bsrq %1, %0":"=r"(ul)
+ : "rm"(ul));
+ return ul;
+}
+
+#ifdef CONFIG_IA32_EMULATION
+#define ipipe_root_nr_syscalls(ti) \
+ ((ti->status & TS_COMPAT) ? IA32_NR_syscalls : NR_syscalls)
+#endif /* CONFIG_IA32_EMULATION */
+
+#endif /* X86_64 */
+
+struct pt_regs;
+struct irq_desc;
+struct ipipe_vm_notifier;
+
+static inline unsigned __ipipe_get_irq_vector(int irq)
+{
+#ifdef CONFIG_X86_IO_APIC
+ unsigned int __ipipe_get_ioapic_irq_vector(int irq);
+ return __ipipe_get_ioapic_irq_vector(irq);
+#elif defined(CONFIG_X86_LOCAL_APIC)
+ return irq >= IPIPE_FIRST_APIC_IRQ ?
+ ipipe_apic_irq_vector(irq) : ISA_IRQ_VECTOR(irq);
+#else
+ return ISA_IRQ_VECTOR(irq);
+#endif
+}
+
+void ipipe_hrtimer_interrupt(void);
+
+void ipipe_reschedule_interrupt(void);
+
+void ipipe_critical_interrupt(void);
+
+int __ipipe_handle_irq(struct pt_regs *regs);
+
+void __ipipe_handle_vm_preemption(struct ipipe_vm_notifier *nfy);
+
+extern int __ipipe_hrtimer_irq;
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* !__X86_IPIPE_BASE_H */
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 889f8b1b5b7f..33fe88ae90be 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -106,13 +106,18 @@
#define LOCAL_TIMER_VECTOR 0xec
-#define NR_VECTORS 256
+/* Interrupt pipeline IPIs */
+#define IPIPE_HRTIMER_VECTOR 0xeb
+#define IPIPE_RESCHEDULE_VECTOR 0xea
+#define IPIPE_CRITICAL_VECTOR 0xe9
-#ifdef CONFIG_X86_LOCAL_APIC
-#define FIRST_SYSTEM_VECTOR LOCAL_TIMER_VECTOR
-#else
-#define FIRST_SYSTEM_VECTOR NR_VECTORS
-#endif
+/*
+ * I-pipe: Lowest vector number which may be assigned to a special
+ * APIC IRQ. We must know this at build time.
+ */
+#define FIRST_SYSTEM_VECTOR IPIPE_CRITICAL_VECTOR
+
+#define NR_VECTORS 256
/*
* Size the maximum number of interrupts.
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 8a0e56e1dcc9..f696de55e769 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -8,6 +8,10 @@
#include <asm/nospec-branch.h>
+#include <linux/ipipe_trace.h>
+#include <linux/compiler.h>
+#include <asm-generic/ipipe.h>
+
/* Provide __cpuidle; we can't safely include <linux/cpu.h> */
#define __cpuidle __attribute__((__section__(".cpuidle.text")))
@@ -66,14 +70,76 @@ static inline __cpuidle void native_halt(void)
asm volatile("hlt": : :"memory");
}
+static inline int native_irqs_disabled(void)
+{
+ unsigned long flags = native_save_fl();
+
+ return !(flags & X86_EFLAGS_IF);
+}
+
#endif
#ifdef CONFIG_PARAVIRT_XXL
#include <asm/paravirt.h>
+#define HARD_COND_ENABLE_INTERRUPTS
+#define HARD_COND_DISABLE_INTERRUPTS
#else
#ifndef __ASSEMBLY__
#include <linux/types.h>
+#ifdef CONFIG_IPIPE
+
+void __ipipe_halt_root(int use_mwait);
+
+static inline notrace unsigned long arch_local_save_flags(void)
+{
+ unsigned long flags;
+
+ flags = (!ipipe_test_root()) << 9;
+ barrier();
+ return flags;
+}
+
+static inline notrace void arch_local_irq_restore(unsigned long flags)
+{
+ barrier();
+ ipipe_restore_root(!(flags & X86_EFLAGS_IF));
+}
+
+static inline notrace void arch_local_irq_disable(void)
+{
+ ipipe_stall_root();
+ barrier();
+}
+
+static inline notrace void arch_local_irq_enable(void)
+{
+ barrier();
+ ipipe_unstall_root();
+}
+
+static inline __cpuidle void arch_safe_halt(void)
+{
+ barrier();
+ __ipipe_halt_root(0);
+}
+
+/* Merge virtual+real interrupt mask bits into a single word. */
+static inline unsigned long arch_mangle_irq_bits(int virt, unsigned long real)
+{
+ return (real & ~(1L << 31)) | ((unsigned long)(virt != 0) << 31);
+}
+
+/* Converse operation of arch_mangle_irq_bits() */
+static inline int arch_demangle_irq_bits(unsigned long *x)
+{
+ int virt = (*x & (1L << 31)) != 0;
+ *x &= ~(1L << 31);
+ return virt;
+}
+
+#else /* !CONFIG_IPIPE */
+
static inline notrace unsigned long arch_local_save_flags(void)
{
return native_save_fl();
@@ -103,6 +169,8 @@ static inline __cpuidle void arch_safe_halt(void)
native_safe_halt();
}
+#endif /* !CONFIG_IPIPE */
+
/*
* Used when interrupts are already enabled or to
* shutdown the processor:
@@ -126,6 +194,14 @@ static inline notrace unsigned long arch_local_irq_save(void)
#define ENABLE_INTERRUPTS(x) sti
#define DISABLE_INTERRUPTS(x) cli
+#ifdef CONFIG_IPIPE
+#define HARD_COND_ENABLE_INTERRUPTS sti
+#define HARD_COND_DISABLE_INTERRUPTS cli
+#else /* !CONFIG_IPIPE */
+#define HARD_COND_ENABLE_INTERRUPTS
+#define HARD_COND_DISABLE_INTERRUPTS
+#endif /* !CONFIG_IPIPE */
+
#ifdef CONFIG_X86_64
#ifdef CONFIG_DEBUG_ENTRY
#define SAVE_FLAGS(x) pushfq; popq %rax
@@ -170,40 +246,156 @@ static inline int arch_irqs_disabled(void)
return arch_irqs_disabled_flags(flags);
}
+
+#ifdef CONFIG_IPIPE
+
+static inline unsigned long hard_local_irq_save_notrace(void)
+{
+ unsigned long flags;
+
+ flags = native_save_fl();
+ native_irq_disable();
+
+ return flags;
+}
+
+static inline void hard_local_irq_restore_notrace(unsigned long flags)
+{
+ native_restore_fl(flags);
+}
+
+static inline void hard_local_irq_disable_notrace(void)
+{
+ native_irq_disable();
+}
+
+static inline void hard_local_irq_enable_notrace(void)
+{
+ native_irq_enable();
+}
+
+static inline int hard_irqs_disabled(void)
+{
+ return native_irqs_disabled();
+}
+
+#define hard_irqs_disabled_flags(flags) arch_irqs_disabled_flags(flags)
+
+#ifdef CONFIG_IPIPE_TRACE_IRQSOFF
+
+static inline void hard_local_irq_disable(void)
+{
+ if (!native_irqs_disabled()) {
+ native_irq_disable();
+ ipipe_trace_begin(0x80000000);
+ }
+}
+
+static inline void hard_local_irq_enable(void)
+{
+ if (native_irqs_disabled()) {
+ ipipe_trace_end(0x80000000);
+ native_irq_enable();
+ }
+}
+
+static inline unsigned long hard_local_irq_save(void)
+{
+ unsigned long flags;
+
+ flags = native_save_fl();
+ if (flags & X86_EFLAGS_IF) {
+ native_irq_disable();
+ ipipe_trace_begin(0x80000001);
+ }
+
+ return flags;
+}
+
+static inline void hard_local_irq_restore(unsigned long flags)
+{
+ if (flags & X86_EFLAGS_IF)
+ ipipe_trace_end(0x80000001);
+
+ native_restore_fl(flags);
+}
+
+#else /* !CONFIG_IPIPE_TRACE_IRQSOFF */
+
+static inline unsigned long hard_local_irq_save(void)
+{
+ return hard_local_irq_save_notrace();
+}
+
+static inline void hard_local_irq_restore(unsigned long flags)
+{
+ hard_local_irq_restore_notrace(flags);
+}
+
+static inline void hard_local_irq_enable(void)
+{
+ hard_local_irq_enable_notrace();
+}
+
+static inline void hard_local_irq_disable(void)
+{
+ hard_local_irq_disable_notrace();
+}
+
+#endif /* CONFIG_IPIPE_TRACE_IRQSOFF */
+
+static inline unsigned long hard_local_save_flags(void)
+{
+ return native_save_fl();
+}
+
+#endif /* CONFIG_IPIPE */
+
#endif /* !__ASSEMBLY__ */
#ifdef __ASSEMBLY__
#ifdef CONFIG_TRACE_IRQFLAGS
# define TRACE_IRQS_ON call trace_hardirqs_on_thunk;
+#ifdef CONFIG_IPIPE
+# define TRACE_IRQS_ON_VIRT call trace_hardirqs_on_virt_thunk;
+#else
+# define TRACE_IRQS_ON_VIRT TRACE_IRQS_ON
+#endif
# define TRACE_IRQS_OFF call trace_hardirqs_off_thunk;
#else
# define TRACE_IRQS_ON
+# define TRACE_IRQS_ON_VIRT
# define TRACE_IRQS_OFF
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# ifdef CONFIG_X86_64
-# define LOCKDEP_SYS_EXIT call lockdep_sys_exit_thunk
+# define LOCKDEP_SYS_EXIT call lockdep_sys_exit_thunk
# define LOCKDEP_SYS_EXIT_IRQ \
TRACE_IRQS_ON; \
sti; \
call lockdep_sys_exit_thunk; \
cli; \
TRACE_IRQS_OFF;
+
# else
-# define LOCKDEP_SYS_EXIT \
+# define LOCKDEP_SYS_EXIT \
pushl %eax; \
pushl %ecx; \
pushl %edx; \
+ pushfl; \
+ sti; \
call lockdep_sys_exit; \
+ popfl; \
popl %edx; \
popl %ecx; \
popl %eax;
+
# define LOCKDEP_SYS_EXIT_IRQ
# endif
#else
# define LOCKDEP_SYS_EXIT
# define LOCKDEP_SYS_EXIT_IRQ
#endif
-#endif /* __ASSEMBLY__ */
+#endif /* __ASSEMBLY__ */
#endif
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 16ae821483c8..2b4afca4e15f 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -174,7 +174,8 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
load_mm_ldt(next);
#endif
- DEBUG_LOCKS_WARN_ON(preemptible());
+ DEBUG_LOCKS_WARN_ON(preemptible() &&
+ (!IS_ENABLED(CONFIG_IPIPE) || !hard_irqs_disabled()));
}
void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
@@ -214,6 +215,9 @@ extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
struct task_struct *tsk);
#define switch_mm_irqs_off switch_mm_irqs_off
+#define ipipe_switch_mm_head(prev, next, tsk) \
+ switch_mm_irqs_off(prev, next, tsk)
+
#define activate_mm(prev, next) \
do { \
paravirt_activate_mm((prev), (next)); \
@@ -379,6 +383,7 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
temp_mm_state_t temp_state;
lockdep_assert_irqs_disabled();
+ hard_cond_local_irq_disable();
temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
switch_mm_irqs_off(NULL, mm, current);
@@ -403,6 +408,7 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
{
lockdep_assert_irqs_disabled();
switch_mm_irqs_off(NULL, prev_state.mm, current);
+ hard_cond_local_irq_enable();
/*
* Restore the breakpoints if they were disabled before the temporary mm
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1ae7c20f5469..a3716561ca4f 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -136,6 +136,7 @@ static inline u32 read_pkru(void)
static inline void write_pkru(u32 pkru)
{
struct pkru_state *pk;
+ unsigned long flags;
if (!boot_cpu_has(X86_FEATURE_OSPKE))
return;
@@ -147,11 +148,11 @@ static inline void write_pkru(u32 pkru)
* written to the CPU. The FPU restore on return to userland would
* otherwise load the previous value again.
*/
- fpregs_lock();
+ flags = fpregs_lock();
if (pk)
pk->pkru = pkru;
__write_pkru(pkru);
- fpregs_unlock();
+ fpregs_unlock(flags);
}
static inline int pte_young(pte_t pte)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index a4de7aa7500f..056664e8fd03 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -52,10 +52,15 @@
struct task_struct;
#include <asm/cpufeature.h>
#include <linux/atomic.h>
+#include <ipipe/thread_info.h>
struct thread_info {
unsigned long flags; /* low level flags */
u32 status; /* thread synchronous flags */
+#ifdef CONFIG_IPIPE
+ unsigned long ipipe_flags;
+ struct ipipe_threadinfo ipipe_data;
+#endif
};
#define INIT_THREAD_INFO(tsk) \
@@ -159,6 +164,17 @@ struct thread_info {
#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY)
#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
+/* ti->ipipe_flags */
+#define TIP_HEAD 0 /* Runs in head domain */
+#define TIP_NOTIFY 1 /* Notify head domain about kernel events */
+#define TIP_MAYDAY 2 /* MAYDAY call is pending */
+#define TIP_USERINTRET 3 /* Notify on IRQ/trap return to root userspace */
+
+#define _TIP_HEAD (1 << TIP_HEAD)
+#define _TIP_NOTIFY (1 << TIP_NOTIFY)
+#define _TIP_MAYDAY (1 << TIP_MAYDAY)
+#define _TIP_USERINTRET (1 << TIP_USERINTRET)
+
#define STACK_WARN (THREAD_SIZE/8)
/*
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index f1ea8f0c8b12..888c837747c3 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -15,6 +15,9 @@
*/
typedef unsigned long long cycles_t;
+extern struct clocksource clocksource_tsc;
+#define __ipipe_hostrt_clock clocksource_tsc
+
extern unsigned int cpu_khz;
extern unsigned int tsc_khz;
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index d6a0e57ecc07..7c0c145ef7a3 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -7,6 +7,7 @@
#include <linux/compiler.h>
#include <linux/kasan-checks.h>
#include <linux/string.h>
+#include <linux/ipipe.h>
#include <asm/asm.h>
#include <asm/page.h>
#include <asm/smap.h>
@@ -68,7 +69,7 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
static inline bool pagefault_disabled(void);
# define WARN_ON_IN_IRQ() \
- WARN_ON_ONCE(!in_task() && !pagefault_disabled())
+ WARN_ON_ONCE(ipipe_root_p && !in_task() && !pagefault_disabled())
#else
# define WARN_ON_IN_IRQ()
#endif
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3578ad248bc9..621d855431c7 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -79,6 +79,7 @@ obj-y += reboot.o
obj-$(CONFIG_X86_MSR) += msr.o
obj-$(CONFIG_X86_CPUID) += cpuid.o
obj-$(CONFIG_PCI) += early-quirks.o
+obj-$(CONFIG_IPIPE) += ipipe.o
apm-y := apm_32.o
obj-$(CONFIG_APM) += apm.o
obj-$(CONFIG_SMP) += smp.o
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 68c734032523..1ee5246338ed 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -35,6 +35,7 @@
#include <linux/dmi.h>
#include <linux/smp.h>
#include <linux/mm.h>
+#include <linux/ipipe_tickdev.h>
#include <asm/trace/irq_vectors.h>
#include <asm/irq_remapping.h>
@@ -272,10 +273,10 @@ void native_apic_icr_write(u32 low, u32 id)
{
unsigned long flags;
- local_irq_save(flags);
+ flags = hard_local_irq_save();
apic_write(APIC_ICR2, SET_APIC_DEST_FIELD(id));
apic_write(APIC_ICR, low);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
u64 native_apic_icr_read(void)
@@ -483,16 +484,20 @@ static int lapic_next_deadline(unsigned long delta,
static int lapic_timer_shutdown(struct clock_event_device *evt)
{
+ unsigned long flags;
unsigned int v;
/* Lapic used as dummy for broadcast ? */
if (evt->features & CLOCK_EVT_FEAT_DUMMY)
return 0;
+ flags = hard_local_irq_save();
v = apic_read(APIC_LVTT);
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write(APIC_LVTT, v);
apic_write(APIC_TMICT, 0);
+ hard_local_irq_restore(flags);
+
return 0;
}
@@ -527,6 +532,17 @@ static void lapic_timer_broadcast(const struct cpumask *mask)
#endif
}
+#ifdef CONFIG_IPIPE
+static void lapic_itimer_ack(void)
+{
+ __ack_APIC_irq();
+}
+
+static DEFINE_PER_CPU(struct ipipe_timer, lapic_itimer) = {
+ .irq = ipipe_apic_vector_irq(LOCAL_TIMER_VECTOR),
+ .ack = lapic_itimer_ack,
+};
+#endif /* CONFIG_IPIPE */
/*
* The local apic timer can be used for any function which is CPU local.
@@ -659,6 +675,16 @@ static void setup_APIC_timer(void)
memcpy(levt, &lapic_clockevent, sizeof(*levt));
levt->cpumask = cpumask_of(smp_processor_id());
+#ifdef CONFIG_IPIPE
+ if (!(lapic_clockevent.features & CLOCK_EVT_FEAT_DUMMY))
+ levt->ipipe_timer = this_cpu_ptr(&lapic_itimer);
+ else {
+ static atomic_t once = ATOMIC_INIT(-1);
+ if (atomic_inc_and_test(&once))
+ printk(KERN_INFO
+ "I-pipe: cannot use LAPIC as a tick device\n");
+ }
+#endif /* CONFIG_IPIPE */
if (this_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) {
levt->name = "lapic-deadline";
@@ -1298,7 +1324,7 @@ void lapic_shutdown(void)
if (!boot_cpu_has(X86_FEATURE_APIC) && !apic_from_smp_config())
return;
- local_irq_save(flags);
+ flags = hard_local_irq_save();
#ifdef CONFIG_X86_32
if (!enabled_via_apicbase)
@@ -1308,7 +1334,7 @@ void lapic_shutdown(void)
disable_local_APIC();
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
/**
@@ -1558,7 +1584,7 @@ static bool apic_check_and_ack(union apic_ir *irr, union apic_ir *isr)
* per set bit.
*/
for_each_set_bit(bit, isr->map, APIC_IR_BITS)
- ack_APIC_irq();
+ __ack_APIC_irq();
return true;
}
@@ -2196,7 +2222,7 @@ __visible void __irq_entry smp_spurious_interrupt(struct pt_regs *regs)
if (v & (1 << (vector & 0x1f))) {
pr_info("Spurious interrupt (vector 0x%02x) on CPU#%d. Acked\n",
vector, smp_processor_id());
- ack_APIC_irq();
+ __ack_APIC_irq();
} else {
pr_info("Spurious interrupt (vector 0x%02x) on CPU#%d. Not pending!\n",
vector, smp_processor_id());
@@ -2654,12 +2680,12 @@ static int lapic_suspend(void)
apic_pm_state.apic_cmci = apic_read(APIC_LVTCMCI);
#endif
- local_irq_save(flags);
+ flags = hard_local_irq_save();
disable_local_APIC();
irq_remapping_disable();
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
return 0;
}
@@ -2672,7 +2698,7 @@ static void lapic_resume(void)
if (!apic_pm_state.active)
return;
- local_irq_save(flags);
+ flags = hard_local_irq_save();
/*
* IO-APIC and PIC have their own resume routines.
@@ -2730,7 +2756,7 @@ static void lapic_resume(void)
irq_remapping_reenable(x2apic_mode);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
/*
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 7862b152a052..d3762181070f 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -52,9 +52,9 @@ static void _flat_send_IPI_mask(unsigned long mask, int vector)
{
unsigned long flags;
- local_irq_save(flags);
+ flags = hard_local_irq_save();
__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
static void flat_send_IPI_mask(const struct cpumask *cpumask, int vector)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 1622cff009c9..0f8a1d4c0750 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -78,7 +78,7 @@
#define for_each_irq_pin(entry, head) \
list_for_each_entry(entry, &head, list)
-static DEFINE_RAW_SPINLOCK(ioapic_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(ioapic_lock);
static DEFINE_MUTEX(ioapic_mutex);
static unsigned int ioapic_dynirq_base;
static int ioapic_initialized;
@@ -466,13 +466,19 @@ static void io_apic_sync(struct irq_pin_list *entry)
readl(&io_apic->data);
}
+static inline void __mask_ioapic(struct mp_chip_data *data)
+{
+ io_apic_modify_irq(data, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
+}
+
static void mask_ioapic_irq(struct irq_data *irq_data)
{
struct mp_chip_data *data = irq_data->chip_data;
unsigned long flags;
raw_spin_lock_irqsave(&ioapic_lock, flags);
- io_apic_modify_irq(data, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
+ ipipe_lock_irq(irq_data->irq);
+ __mask_ioapic(data);
raw_spin_unlock_irqrestore(&ioapic_lock, flags);
}
@@ -488,6 +494,7 @@ static void unmask_ioapic_irq(struct irq_data *irq_data)
raw_spin_lock_irqsave(&ioapic_lock, flags);
__unmask_ioapic(data);
+ ipipe_unlock_irq(irq_data->irq);
raw_spin_unlock_irqrestore(&ioapic_lock, flags);
}
@@ -531,14 +538,20 @@ static void __eoi_ioapic_pin(int apic, int pin, int vector)
}
}
-static void eoi_ioapic_pin(int vector, struct mp_chip_data *data)
+static void _eoi_ioapic_pin(int vector, struct mp_chip_data *data)
{
- unsigned long flags;
struct irq_pin_list *entry;
- raw_spin_lock_irqsave(&ioapic_lock, flags);
for_each_irq_pin(entry, data->irq_2_pin)
__eoi_ioapic_pin(entry->apic, entry->pin, vector);
+}
+
+void eoi_ioapic_pin(int vector, struct mp_chip_data *data)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&ioapic_lock, flags);
+ _eoi_ioapic_pin(vector, data);
raw_spin_unlock_irqrestore(&ioapic_lock, flags);
}
@@ -1216,6 +1229,19 @@ EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
static struct irq_chip ioapic_chip, ioapic_ir_chip;
+#ifdef CONFIG_IPIPE
+static void startup_legacy_irq(unsigned irq)
+{
+ unsigned long flags;
+ legacy_pic->mask(irq);
+ flags = hard_local_irq_save();
+ __ipipe_unlock_irq(irq);
+ hard_local_irq_restore(flags);
+}
+#else /* !CONFIG_IPIPE */
+#define startup_legacy_irq(irq) legacy_pic->mask(irq)
+#endif /* !CONFIG_IPIPE */
+
static void __init setup_IO_APIC_irqs(void)
{
unsigned int ioapic, pin;
@@ -1699,11 +1725,12 @@ static unsigned int startup_ioapic_irq(struct irq_data *data)
raw_spin_lock_irqsave(&ioapic_lock, flags);
if (irq < nr_legacy_irqs()) {
- legacy_pic->mask(irq);
+ startup_legacy_irq(irq);
if (legacy_pic->irq_pending(irq))
was_pending = 1;
}
__unmask_ioapic(data->chip_data);
+ ipipe_unlock_irq(irq);
raw_spin_unlock_irqrestore(&ioapic_lock, flags);
return was_pending;
@@ -1711,7 +1738,7 @@ static unsigned int startup_ioapic_irq(struct irq_data *data)
atomic_t irq_mis_count;
-#ifdef CONFIG_GENERIC_PENDING_IRQ
+#if defined(CONFIG_GENERIC_PENDING_IRQ) || (defined(CONFIG_IPIPE) && defined(CONFIG_SMP))
static bool io_apic_level_ack_pending(struct mp_chip_data *data)
{
struct irq_pin_list *entry;
@@ -1796,9 +1823,9 @@ static void ioapic_ack_level(struct irq_data *irq_data)
{
struct irq_cfg *cfg = irqd_cfg(irq_data);
unsigned long v;
- bool masked;
int i;
-
+#ifndef CONFIG_IPIPE
+ bool masked;
irq_complete_move(cfg);
masked = ioapic_irqd_mask(irq_data);
@@ -1856,6 +1883,24 @@ static void ioapic_ack_level(struct irq_data *irq_data)
}
ioapic_irqd_unmask(irq_data, masked);
+#else /* CONFIG_IPIPE */
+ /*
+ * Prevent low priority IRQs grabbed by high priority domains
+ * from being delayed, waiting for a high priority interrupt
+ * handler running in a low priority domain to complete.
+ * This code assumes hw interrupts off.
+ */
+ i = cfg->vector;
+ v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
+ if (unlikely(!(v & (1 << (i & 0x1f))))) {
+ /* IO-APIC erratum: see comment above. */
+ atomic_inc(&irq_mis_count);
+ raw_spin_lock(&ioapic_lock);
+ _eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
+ raw_spin_unlock(&ioapic_lock);
+ }
+ __ack_APIC_irq();
+#endif /* CONFIG_IPIPE */
}
static void ioapic_ir_ack_level(struct irq_data *irq_data)
@@ -1951,6 +1996,69 @@ static int ioapic_irq_get_chip_state(struct irq_data *irqd,
return 0;
}
+#ifdef CONFIG_IPIPE
+
+#ifdef CONFIG_SMP
+
+void move_xxapic_irq(struct irq_data *irq_data)
+{
+ unsigned int irq = irq_data->irq;
+ struct irq_desc *desc = irq_to_desc(irq);
+ struct mp_chip_data *data = irq_data->chip_data;
+ struct irq_cfg *cfg = irqd_cfg(irq_data);
+
+ if (desc->handle_irq == &handle_edge_irq) {
+ raw_spin_lock(&desc->lock);
+ irq_complete_move(cfg);
+ irq_move_irq(irq_data);
+ raw_spin_unlock(&desc->lock);
+ } else if (desc->handle_irq == &handle_fasteoi_irq) {
+ raw_spin_lock(&desc->lock);
+ irq_complete_move(cfg);
+ if (unlikely(irqd_is_setaffinity_pending(irq_data))) {
+ if (!io_apic_level_ack_pending(data))
+ irq_move_masked_irq(irq_data);
+ unmask_ioapic_irq(irq_data);
+ }
+ raw_spin_unlock(&desc->lock);
+ } else
+ WARN_ON_ONCE(1);
+}
+
+#endif /* CONFIG_SMP */
+
+static void hold_ioapic_irq(struct irq_data *irq_data)
+{
+ struct mp_chip_data *data = irq_data->chip_data;
+
+ raw_spin_lock(&ioapic_lock);
+ __mask_ioapic(data);
+ raw_spin_unlock(&ioapic_lock);
+ ioapic_ack_level(irq_data);
+}
+
+static void hold_ioapic_ir_irq(struct irq_data *irq_data)
+{
+ struct mp_chip_data *data = irq_data->chip_data;
+
+ raw_spin_lock(&ioapic_lock);
+ __mask_ioapic(data);
+ raw_spin_unlock(&ioapic_lock);
+ ioapic_ir_ack_level(irq_data);
+}
+
+static void release_ioapic_irq(struct irq_data *irq_data)
+{
+ struct mp_chip_data *data = irq_data->chip_data;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&ioapic_lock, flags);
+ __unmask_ioapic(data);
+ raw_spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
+#endif /* CONFIG_IPIPE */
+
static struct irq_chip ioapic_chip __read_mostly = {
.name = "IO-APIC",
.irq_startup = startup_ioapic_irq,
@@ -1961,6 +2069,13 @@ static struct irq_chip ioapic_chip __read_mostly = {
.irq_set_affinity = ioapic_set_affinity,
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_get_irqchip_state = ioapic_irq_get_chip_state,
+#ifdef CONFIG_IPIPE
+#ifdef CONFIG_SMP
+ .irq_move = move_xxapic_irq,
+#endif
+ .irq_hold = hold_ioapic_irq,
+ .irq_release = release_ioapic_irq,
+#endif
.flags = IRQCHIP_SKIP_SET_WAKE |
IRQCHIP_AFFINITY_PRE_STARTUP,
};
@@ -1975,6 +2090,13 @@ static struct irq_chip ioapic_ir_chip __read_mostly = {
.irq_set_affinity = ioapic_set_affinity,
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_get_irqchip_state = ioapic_irq_get_chip_state,
+#ifdef CONFIG_IPIPE
+#ifdef CONFIG_SMP
+ .irq_move = move_xxapic_irq,
+#endif
+ .irq_hold = hold_ioapic_ir_irq,
+ .irq_release = release_ioapic_irq,
+#endif
.flags = IRQCHIP_SKIP_SET_WAKE |
IRQCHIP_AFFINITY_PRE_STARTUP,
};
@@ -2007,23 +2129,29 @@ static inline void init_IO_APIC_traps(void)
static void mask_lapic_irq(struct irq_data *data)
{
- unsigned long v;
+ unsigned long v, flags;
+ flags = hard_cond_local_irq_save();
+ ipipe_lock_irq(data->irq);
v = apic_read(APIC_LVT0);
apic_write(APIC_LVT0, v | APIC_LVT_MASKED);
+ hard_cond_local_irq_restore(flags);
}
static void unmask_lapic_irq(struct irq_data *data)
{
- unsigned long v;
+ unsigned long v, flags;
+ flags = hard_cond_local_irq_save();
v = apic_read(APIC_LVT0);
apic_write(APIC_LVT0, v & ~APIC_LVT_MASKED);
+ ipipe_unlock_irq(data->irq);
+ hard_cond_local_irq_restore(flags);
}
static void ack_lapic_irq(struct irq_data *data)
{
- ack_APIC_irq();
+ __ack_APIC_irq();
}
static struct irq_chip lapic_chip __read_mostly = {
@@ -2031,6 +2159,9 @@ static struct irq_chip lapic_chip __read_mostly = {
.irq_mask = mask_lapic_irq,
.irq_unmask = unmask_lapic_irq,
.irq_ack = ack_lapic_irq,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+ .irq_move = move_xxapic_irq,
+#endif
};
static void lapic_register_intr(int irq)
@@ -2153,7 +2284,7 @@ static inline void __init check_timer(void)
/*
* get/set the timer IRQ vector:
*/
- legacy_pic->mask(0);
+ startup_legacy_irq(0);
/*
* As IRQ0 is to be enabled in the 8259A, the virtual
@@ -2250,6 +2381,10 @@ static inline void __init check_timer(void)
"...trying to set up timer as Virtual Wire IRQ...\n");
lapic_register_intr(0);
+#if defined(CONFIG_IPIPE) && defined(CONFIG_X86_64)
+ irq_to_desc(0)->ipipe_ack = __ipipe_ack_edge_irq;
+ irq_to_desc(0)->ipipe_end = __ipipe_nop_irq;
+#endif
apic_write(APIC_LVT0, APIC_DM_FIXED | cfg->vector); /* Fixed mode */
legacy_pic->unmask(0);
@@ -2258,7 +2393,7 @@ static inline void __init check_timer(void)
goto out;
}
local_irq_disable();
- legacy_pic->mask(0);
+ startup_legacy_irq(0);
apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
apic_printk(APIC_QUIET, KERN_INFO "..... failed.\n");
@@ -2636,6 +2771,21 @@ int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity)
return 0;
}
+#ifdef CONFIG_IPIPE
+unsigned int __ipipe_get_ioapic_irq_vector(int irq)
+{
+ if (irq >= IPIPE_FIRST_APIC_IRQ && irq < IPIPE_NR_XIRQS)
+ return ipipe_apic_irq_vector(irq);
+ else if (irq == IRQ_MOVE_CLEANUP_VECTOR)
+ return irq;
+ else {
+ if (irq_cfg(irq) == NULL)
+ return ISA_IRQ_VECTOR(irq); /* Assume ISA. */
+ return irq_cfg(irq)->vector;
+ }
+}
+#endif /* CONFIG_IPIPE */
+
/*
* This function updates target affinity of IOAPIC interrupts to include
* the CPUs which came online during SMP bringup.
@@ -3036,7 +3186,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
mp_setup_entry(cfg, data, info->ioapic_entry);
mp_register_handler(virq, data->trigger);
if (virq < nr_legacy_irqs())
- legacy_pic->mask(virq);
+ startup_legacy_irq(virq);
local_irq_restore(flags);
apic_printk(APIC_VERBOSE, KERN_DEBUG
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 6ca0f91372fd..b6784487a2c2 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -117,7 +117,9 @@ void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
* to the APIC.
*/
unsigned int cfg;
+ unsigned long flags;
+ flags = hard_cond_local_irq_save();
/*
* Wait for idle.
*/
@@ -136,6 +138,8 @@ void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
* Send the IPI. The write to APIC_ICR fires this off.
*/
native_apic_mem_write(APIC_ICR, cfg);
+
+ hard_cond_local_irq_restore(flags);
}
/*
@@ -144,8 +148,9 @@ void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
*/
void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest)
{
- unsigned long cfg;
+ unsigned long cfg, flags;
+ flags = hard_cond_local_irq_save();
/*
* Wait for idle.
*/
@@ -169,6 +174,8 @@ void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int d
* Send the IPI. The write to APIC_ICR fires this off.
*/
native_apic_mem_write(APIC_ICR, cfg);
+
+ hard_cond_local_irq_restore(flags);
}
void default_send_IPI_single_phys(int cpu, int vector)
@@ -191,12 +198,12 @@ void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int vector)
* to an arbitrary mask, so I do a unicast to each CPU instead.
* - mbligh
*/
- local_irq_save(flags);
+ flags = hard_local_irq_save();
for_each_cpu(query_cpu, mask) {
__default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
query_cpu), vector, APIC_DEST_PHYSICAL);
}
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
@@ -208,14 +215,14 @@ void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
/* See Hack comment above */
- local_irq_save(flags);
+ flags = hard_local_irq_save();
for_each_cpu(query_cpu, mask) {
if (query_cpu == this_cpu)
continue;
__default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
query_cpu), vector, APIC_DEST_PHYSICAL);
}
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
/*
@@ -255,12 +262,12 @@ void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
* should be modified to do 1 message per cluster ID - mbligh
*/
- local_irq_save(flags);
+ flags = hard_local_irq_save();
for_each_cpu(query_cpu, mask)
__default_send_IPI_dest_field(
early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
vector, apic->dest_logical);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
@@ -272,7 +279,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
/* See Hack comment above */
- local_irq_save(flags);
+ flags = hard_local_irq_save();
for_each_cpu(query_cpu, mask) {
if (query_cpu == this_cpu)
continue;
@@ -280,7 +287,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
vector, apic->dest_logical);
}
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
/*
@@ -294,10 +301,10 @@ void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector)
if (!mask)
return;
- local_irq_save(flags);
+ flags = hard_local_irq_save();
WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
/* must come after the send_IPI functions above for inlining */
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index f86e10b1d99c..0f3f2cd36f03 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -180,8 +180,12 @@ static struct irq_chip pci_msi_controller = {
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_compose_msi_msg = irq_msi_compose_msg,
.irq_set_affinity = msi_set_affinity,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+ .irq_move = move_xxapic_irq,
+#endif
.flags = IRQCHIP_SKIP_SET_WAKE |
- IRQCHIP_AFFINITY_PRE_STARTUP,
+ IRQCHIP_AFFINITY_PRE_STARTUP |
+ IRQCHIP_PIPELINE_SAFE,
};
int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
@@ -282,8 +286,12 @@ static struct irq_chip pci_msi_ir_controller = {
.irq_ack = irq_chip_ack_parent,
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+ .irq_move = move_xxapic_irq,
+#endif
.flags = IRQCHIP_SKIP_SET_WAKE |
- IRQCHIP_AFFINITY_PRE_STARTUP,
+ IRQCHIP_AFFINITY_PRE_STARTUP |
+ IRQCHIP_PIPELINE_SAFE,
};
static struct msi_domain_info pci_msi_ir_domain_info = {
@@ -326,8 +334,12 @@ static struct irq_chip dmar_msi_controller = {
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_compose_msi_msg = irq_msi_compose_msg,
.irq_write_msi_msg = dmar_msi_write_msg,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+ .irq_move = move_xxapic_irq,
+#endif
.flags = IRQCHIP_SKIP_SET_WAKE |
- IRQCHIP_AFFINITY_PRE_STARTUP,
+ IRQCHIP_AFFINITY_PRE_STARTUP |
+ IRQCHIP_PIPELINE_SAFE,
};
static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
@@ -425,7 +437,11 @@ static struct irq_chip hpet_msi_controller __ro_after_init = {
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_compose_msi_msg = irq_msi_compose_msg,
.irq_write_msi_msg = hpet_msi_write_msg,
- .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_AFFINITY_PRE_STARTUP,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+ .irq_move = move_xxapic_irq,
+#endif
+ .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_AFFINITY_PRE_STARTUP |
+ IRQCHIP_PIPELINE_SAFE,
};
static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6b8b6bf6c5d1..32889c2ef4a1 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -39,7 +39,7 @@ struct apic_chip_data {
struct irq_domain *x86_vector_domain;
EXPORT_SYMBOL_GPL(x86_vector_domain);
-static DEFINE_RAW_SPINLOCK(vector_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(vector_lock);
static cpumask_var_t vector_searchmask;
static struct irq_chip lapic_controller;
static struct irq_matrix *vector_matrix;
@@ -119,7 +119,9 @@ static void apic_update_irq_cfg(struct irq_data *irqd, unsigned int vector,
{
struct apic_chip_data *apicd = apic_chip_data(irqd);
+#ifndef CONFIG_IPIPE
lockdep_assert_held(&vector_lock);
+#endif
apicd->hw_irq_cfg.vector = vector;
apicd->hw_irq_cfg.dest_apicid = apic->calc_dest_apicid(cpu);
@@ -135,7 +137,9 @@ static void apic_update_vector(struct irq_data *irqd, unsigned int newvec,
struct irq_desc *desc = irq_data_to_desc(irqd);
bool managed = irqd_affinity_is_managed(irqd);
+#ifndef CONFIG_IPIPE
lockdep_assert_held(&vector_lock);
+#endif
trace_vector_update(irqd->irq, newvec, newcpu, apicd->vector,
apicd->cpu);
@@ -225,7 +229,9 @@ assign_vector_locked(struct irq_data *irqd, const struct cpumask *dest)
unsigned int cpu = apicd->cpu;
int vector = apicd->vector;
+#ifndef CONFIG_IPIPE
lockdep_assert_held(&vector_lock);
+#endif
/*
* If the current target CPU is online and in the new requested
@@ -336,7 +342,9 @@ static void clear_irq_vector(struct irq_data *irqd)
bool managed = irqd_affinity_is_managed(irqd);
unsigned int vector = apicd->vector;
+#ifndef CONFIG_IPIPE
lockdep_assert_held(&vector_lock);
+#endif
if (!vector)
return;
@@ -766,7 +774,9 @@ void lapic_online(void)
{
unsigned int vector;
+#ifndef CONFIG_IPIPE
lockdep_assert_held(&vector_lock);
+#endif
/* Online the vector matrix array for this CPU */
irq_matrix_online(vector_matrix);
@@ -827,13 +837,17 @@ static int apic_retrigger_irq(struct irq_data *irqd)
void apic_ack_irq(struct irq_data *irqd)
{
+#ifndef CONFIG_IPIPE
irq_move_irq(irqd);
- ack_APIC_irq();
+#endif /* !CONFIG_IPIPE */
+ __ack_APIC_irq();
}
void apic_ack_edge(struct irq_data *irqd)
{
+#ifndef CONFIG_IPIPE
irq_complete_move(irqd_cfg(irqd));
+#endif /* !CONFIG_IPIPE */
apic_ack_irq(irqd);
}
@@ -935,7 +949,13 @@ static void __irq_complete_move(struct irq_cfg *cfg, unsigned vector)
if (likely(!apicd->move_in_progress))
return;
- if (vector == apicd->vector && apicd->cpu == smp_processor_id())
+ /*
+ * If the interrupt arrived on the new target CPU, cleanup the
+ * vector on the old target CPU. A vector check is not required
+ * because an interrupt can never move from one vector to another
+ * on the same CPU.
+ */
+ if (apicd->cpu == smp_processor_id())
__send_cleanup_vector(apicd);
}
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 7eec3c154fa2..52fdf80e4d8b 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -44,7 +44,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
/* x2apic MSRs are special and need a special fence: */
weak_wrmsr_fence();
- local_irq_save(flags);
+ flags = hard_local_irq_save();
tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask);
cpumask_copy(tmpmsk, mask);
@@ -68,7 +68,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
}
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
static void x2apic_send_IPI_mask(const struct cpumask *mask, int vector)
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index 032a00e5d9fa..72ebc33401c2 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -58,7 +58,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
/* x2apic MSRs are special and need a special fence: */
weak_wrmsr_fence();
- local_irq_save(flags);
+ flags = hard_local_irq_save();
this_cpu = smp_processor_id();
for_each_cpu(query_cpu, mask) {
@@ -67,7 +67,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
__x2apic_send_IPI_dest(per_cpu(x86_cpu_to_apicid, query_cpu),
vector, APIC_DEST_PHYSICAL);
}
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
static void x2apic_send_IPI_mask(const struct cpumask *mask, int vector)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 5c7ee3df4d0b..343927455003 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -38,6 +38,9 @@ static void __used common(void)
#endif
BLANK();
+#ifdef CONFIG_IPIPE
+ OFFSET(TASK_TI_ipipe, task_struct, thread_info.ipipe_flags);
+#endif
OFFSET(TASK_addr_limit, task_struct, thread.addr_limit);
BLANK();
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5e1e32f1086b..0c8317b8170d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1849,6 +1849,7 @@ void syscall_init(void)
X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
}
+#ifndef CONFIG_IPIPE
DEFINE_PER_CPU(int, debug_stack_usage);
DEFINE_PER_CPU(u32, debug_idt_ctr);
@@ -1867,6 +1868,7 @@ void debug_stack_reset(void)
load_current_idt();
}
NOKPROBE_SYMBOL(debug_stack_reset);
+#endif /* !CONFIG_IPIPE */
#else /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/cpu/mtrr/cyrix.c b/arch/x86/kernel/cpu/mtrr/cyrix.c
index 72182809b333..a9d703aab500 100644
--- a/arch/x86/kernel/cpu/mtrr/cyrix.c
+++ b/arch/x86/kernel/cpu/mtrr/cyrix.c
@@ -19,7 +19,7 @@ cyrix_get_arr(unsigned int reg, unsigned long *base,
arr = CX86_ARR_BASE + (reg << 1) + reg; /* avoid multiplication by 3 */
- local_irq_save(flags);
+ flags = hard_local_irq_save();
ccr3 = getCx86(CX86_CCR3);
setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10); /* enable MAPEN */
@@ -29,7 +29,7 @@ cyrix_get_arr(unsigned int reg, unsigned long *base,
rcr = getCx86(CX86_RCR_BASE + reg);
setCx86(CX86_CCR3, ccr3); /* disable MAPEN */
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
shift = ((unsigned char *) base)[1] & 0x0f;
*base >>= PAGE_SHIFT;
@@ -180,6 +180,7 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
unsigned long size, mtrr_type type)
{
unsigned char arr, arr_type, arr_size;
+ unsigned long flags;
arr = CX86_ARR_BASE + (reg << 1) + reg; /* avoid multiplication by 3 */
@@ -223,6 +224,8 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
}
}
+ flags = hard_local_irq_save();
+
prepare_set();
base <<= PAGE_SHIFT;
@@ -232,6 +235,8 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
setCx86(CX86_RCR_BASE + reg, arr_type);
post_set();
+
+ hard_local_irq_restore(flags);
}
typedef struct {
@@ -249,8 +254,10 @@ static unsigned char ccr_state[7] = { 0, 0, 0, 0, 0, 0, 0 };
static void cyrix_set_all(void)
{
+ unsigned long flags;
int i;
+ flags = hard_local_irq_save();
prepare_set();
/* the CCRs are not contiguous */
@@ -265,6 +272,7 @@ static void cyrix_set_all(void)
}
post_set();
+ hard_local_irq_restore(flags);
}
static const struct mtrr_ops cyrix_mtrr_ops = {
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 4ea906fe1c35..3aafccf52f75 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -797,7 +797,7 @@ static void generic_set_all(void)
unsigned long mask, count;
unsigned long flags;
- local_irq_save(flags);
+ flags = hard_local_irq_save();
prepare_set();
/* Actually set the state */
@@ -807,7 +807,7 @@ static void generic_set_all(void)
pat_init();
post_set();
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
/* Use the atomic bitops to update the global mask */
for (count = 0; count < sizeof(mask) * 8; ++count) {
@@ -831,12 +831,13 @@ static void generic_set_all(void)
static void generic_set_mtrr(unsigned int reg, unsigned long base,
unsigned long size, mtrr_type type)
{
- unsigned long flags;
+ unsigned long rflags, vflags;
struct mtrr_var_range *vr;
vr = &mtrr_state.var_ranges[reg];
- local_irq_save(flags);
+ local_irq_save(vflags);
+ rflags = hard_local_irq_save();
prepare_set();
if (size == 0) {
@@ -857,7 +858,8 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
}
post_set();
- local_irq_restore(flags);
+ hard_local_irq_restore(rflags);
+ local_irq_restore(vflags);
}
int generic_validate_add_page(unsigned long base, unsigned long size,
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8c9b202f3e6d..e06c799149db 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -36,18 +36,13 @@ union fpregs_state init_fpstate __read_mostly;
*
* - to debug kernel_fpu_begin()/end() correctness
*/
-static DEFINE_PER_CPU(bool, in_kernel_fpu);
+DEFINE_PER_CPU(bool, in_kernel_fpu);
/*
* Track which context is using the FPU on the CPU:
*/
DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
-static bool kernel_fpu_disabled(void)
-{
- return this_cpu_read(in_kernel_fpu);
-}
-
static bool interrupted_kernel_fpu_idle(void)
{
return !kernel_fpu_disabled();
@@ -84,12 +79,13 @@ EXPORT_SYMBOL(irq_fpu_usable);
void kernel_fpu_begin_mask(unsigned int kfpu_mask)
{
- preempt_disable();
+ unsigned long flags;
+ preempt_disable();
WARN_ON_FPU(!irq_fpu_usable());
- WARN_ON_FPU(this_cpu_read(in_kernel_fpu));
- this_cpu_write(in_kernel_fpu, true);
+ flags = hard_cond_local_irq_save();
+ kernel_fpu_disable();
if (!(current->flags & PF_KTHREAD) &&
!test_thread_flag(TIF_NEED_FPU_LOAD)) {
@@ -101,6 +97,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
copy_fpregs_to_fpstate(¤t->thread.fpu);
}
__cpu_invalidate_fpregs_state();
+ hard_cond_local_irq_restore(flags);
/* Put sane initial values into the control registers. */
if (likely(kfpu_mask & KFPU_MXCSR) && boot_cpu_has(X86_FEATURE_XMM))
@@ -113,9 +110,11 @@ EXPORT_SYMBOL_GPL(kernel_fpu_begin_mask);
void kernel_fpu_end(void)
{
- WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
+ unsigned long flags;
- this_cpu_write(in_kernel_fpu, false);
+ flags = hard_cond_local_irq_save();
+ kernel_fpu_enable();
+ hard_cond_local_irq_restore(flags);
preempt_enable();
}
EXPORT_SYMBOL_GPL(kernel_fpu_end);
@@ -127,9 +126,11 @@ EXPORT_SYMBOL_GPL(kernel_fpu_end);
*/
void fpu__save(struct fpu *fpu)
{
+ unsigned long flags;
+
WARN_ON_FPU(fpu != ¤t->thread.fpu);
- fpregs_lock();
+ flags = fpregs_lock();
trace_x86_fpu_before_save(fpu);
if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
@@ -139,7 +140,7 @@ void fpu__save(struct fpu *fpu)
}
trace_x86_fpu_after_save(fpu);
- fpregs_unlock();
+ fpregs_unlock(flags);
}
/*
@@ -175,6 +176,7 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
{
struct fpu *dst_fpu = &dst->thread.fpu;
struct fpu *src_fpu = &src->thread.fpu;
+ unsigned long flags;
dst_fpu->last_cpu = -1;
@@ -197,14 +199,14 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
* ( The function 'fails' in the FNSAVE case, which destroys
* register contents so we have to load them back. )
*/
- fpregs_lock();
+ flags = fpregs_lock();
if (test_thread_flag(TIF_NEED_FPU_LOAD))
memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_size);
else if (!copy_fpregs_to_fpstate(dst_fpu))
copy_kernel_to_fpregs(&dst_fpu->state);
- fpregs_unlock();
+ fpregs_unlock(flags);
set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD);
@@ -218,7 +220,7 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
* Activate the current task's in-memory FPU context,
* if it has not been used before:
*/
-static void fpu__initialize(struct fpu *fpu)
+void fpu__initialize(struct fpu *fpu)
{
WARN_ON_FPU(fpu != ¤t->thread.fpu);
@@ -271,6 +273,14 @@ void fpu__prepare_write(struct fpu *fpu)
__fpu_invalidate_fpregs_state(fpu);
}
+#ifdef CONFIG_IPIPE
+#define FWAIT_PROLOGUE "sti\n"
+#define FWAIT_EPILOGUE "cli\n"
+#else
+#define FWAIT_PROLOGUE
+#define FWAIT_EPILOGUE
+#endif
+
/*
* Drops current FPU state: deactivates the fpregs and
* the fpstate. NOTE: it still leaves previous contents
@@ -282,19 +292,22 @@ void fpu__prepare_write(struct fpu *fpu)
*/
void fpu__drop(struct fpu *fpu)
{
- preempt_disable();
+ unsigned long flags;
+ flags = hard_preempt_disable();
if (fpu == ¤t->thread.fpu) {
/* Ignore delayed exceptions from user space */
- asm volatile("1: fwait\n"
+ asm volatile(FWAIT_PROLOGUE
+ "1: fwait\n"
"2:\n"
+ FWAIT_EPILOGUE
_ASM_EXTABLE(1b, 2b));
fpregs_deactivate(fpu);
}
trace_x86_fpu_dropped(fpu);
- preempt_enable();
+ hard_preempt_enable(flags);
}
/*
@@ -303,7 +316,9 @@ void fpu__drop(struct fpu *fpu)
*/
static inline void copy_init_fpstate_to_fpregs(void)
{
- fpregs_lock();
+ unsigned long flags;
+
+ flags = fpregs_lock();
if (use_xsave())
copy_kernel_to_xregs(&init_fpstate.xsave, -1);
@@ -316,7 +331,7 @@ static inline void copy_init_fpstate_to_fpregs(void)
copy_init_pkru_to_fpregs();
fpregs_mark_activate();
- fpregs_unlock();
+ fpregs_unlock(flags);
}
/*
@@ -327,8 +342,11 @@ static inline void copy_init_fpstate_to_fpregs(void)
*/
void fpu__clear(struct fpu *fpu)
{
+ unsigned long flags;
WARN_ON_FPU(fpu != ¤t->thread.fpu); /* Almost certainly an anomaly */
+ flags = hard_cond_local_irq_save();
+
fpu__drop(fpu);
/*
@@ -337,6 +355,8 @@ void fpu__clear(struct fpu *fpu)
fpu__initialize(fpu);
if (static_cpu_has(X86_FEATURE_FPU))
copy_init_fpstate_to_fpregs();
+
+ hard_cond_local_irq_restore(flags);
}
/*
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index ab2f9c2f0683..3ad4f9750dd7 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -61,11 +61,12 @@ static inline int save_fsave_header(struct task_struct *tsk, void __user *buf)
struct xregs_state *xsave = &tsk->thread.fpu.state.xsave;
struct user_i387_ia32_struct env;
struct _fpstate_32 __user *fp = buf;
+ unsigned long flags;
- fpregs_lock();
+ flags = fpregs_lock();
if (!test_thread_flag(TIF_NEED_FPU_LOAD))
copy_fxregs_to_kernel(&tsk->thread.fpu);
- fpregs_unlock();
+ fpregs_unlock(flags);
convert_from_fxsr(&env, tsk);
@@ -165,6 +166,7 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
{
struct task_struct *tsk = current;
int ia32_fxstate = (buf != buf_fx);
+ unsigned long flags;
int ret;
ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
@@ -185,14 +187,14 @@ retry:
* userland's stack frame which will likely succeed. If it does not,
* resolve the fault in the user memory and try again.
*/
- fpregs_lock();
+ flags = fpregs_lock();
if (test_thread_flag(TIF_NEED_FPU_LOAD))
__fpregs_load_activate();
pagefault_disable();
ret = copy_fpregs_to_sigframe(buf_fx);
pagefault_enable();
- fpregs_unlock();
+ fpregs_unlock(flags);
if (ret) {
if (!fault_in_pages_writeable(buf_fx, fpu_user_xstate_size))
@@ -277,6 +279,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
struct task_struct *tsk = current;
struct fpu *fpu = &tsk->thread.fpu;
struct user_i387_ia32_struct env;
+ unsigned long flags;
u64 xfeatures = 0;
int fx_only = 0;
int ret = 0;
@@ -347,17 +350,17 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
* going through the kernel buffer with the enabled pagefault
* handler.
*/
- fpregs_lock();
+ flags = fpregs_lock();
pagefault_disable();
ret = copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only);
pagefault_enable();
if (!ret) {
fpregs_mark_activate();
- fpregs_unlock();
+ fpregs_unlock(flags);
return 0;
}
fpregs_deactivate(fpu);
- fpregs_unlock();
+ fpregs_unlock(flags);
}
@@ -377,7 +380,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only);
- fpregs_lock();
+ flags = fpregs_lock();
if (unlikely(init_bv))
copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
ret = copy_kernel_to_xregs_err(&fpu->state.xsave, xfeatures);
@@ -391,7 +394,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only);
- fpregs_lock();
+ flags = fpregs_lock();
if (use_xsave()) {
u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
@@ -403,14 +406,14 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
if (ret)
goto out;
- fpregs_lock();
+ flags = fpregs_lock();
ret = copy_kernel_to_fregs_err(&fpu->state.fsave);
}
if (!ret)
fpregs_mark_activate();
else
fpregs_deactivate(fpu);
- fpregs_unlock();
+ fpregs_unlock(flags);
out:
if (ret)
diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index 8821d0ab0a08..c4a51c25ce83 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -33,7 +33,7 @@
static void init_8259A(int auto_eoi);
static int i8259A_auto_eoi;
-DEFINE_RAW_SPINLOCK(i8259A_lock);
+IPIPE_DEFINE_RAW_SPINLOCK(i8259A_lock);
/*
* 8259A PIC functions to handle ISA devices:
@@ -61,6 +61,7 @@ static void mask_8259A_irq(unsigned int irq)
unsigned long flags;
raw_spin_lock_irqsave(&i8259A_lock, flags);
+ ipipe_lock_irq(irq);
cached_irq_mask |= mask;
if (irq & 8)
outb(cached_slave_mask, PIC_SLAVE_IMR);
@@ -76,15 +77,18 @@ static void disable_8259A_irq(struct irq_data *data)
static void unmask_8259A_irq(unsigned int irq)
{
- unsigned int mask = ~(1 << irq);
+ unsigned int mask = (1 << irq);
unsigned long flags;
raw_spin_lock_irqsave(&i8259A_lock, flags);
- cached_irq_mask &= mask;
- if (irq & 8)
- outb(cached_slave_mask, PIC_SLAVE_IMR);
- else
- outb(cached_master_mask, PIC_MASTER_IMR);
+ if (cached_irq_mask & mask) {
+ cached_irq_mask &= ~mask;
+ if (irq & 8)
+ outb(cached_slave_mask, PIC_SLAVE_IMR);
+ else
+ outb(cached_master_mask, PIC_MASTER_IMR);
+ ipipe_unlock_irq(irq);
+ }
raw_spin_unlock_irqrestore(&i8259A_lock, flags);
}
@@ -172,6 +176,18 @@ static void mask_and_ack_8259A(struct irq_data *data)
*/
if (cached_irq_mask & irqmask)
goto spurious_8259A_irq;
+#ifdef CONFIG_IPIPE
+ if (irq == 0) {
+ /*
+ * Fast timer ack -- don't mask (unless supposedly
+ * spurious). We trace outb's in order to detect
+ * broken hardware inducing large delays.
+ */
+ outb(0x60, PIC_MASTER_CMD); /* Specific EOI to master. */
+ raw_spin_unlock_irqrestore(&i8259A_lock, flags);
+ return;
+ }
+#endif /* CONFIG_IPIPE */
cached_irq_mask |= irqmask;
handle_real_irq:
@@ -228,6 +244,7 @@ struct irq_chip i8259A_chip = {
.irq_disable = disable_8259A_irq,
.irq_unmask = enable_8259A_irq,
.irq_mask_ack = mask_and_ack_8259A,
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static char irq_trigger[2];
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 7bb4c3cbf4dc..ea919100022a 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -114,6 +114,10 @@ static const __initconst struct idt_data apic_idts[] = {
INTG(CALL_FUNCTION_SINGLE_VECTOR, call_function_single_interrupt),
INTG(IRQ_MOVE_CLEANUP_VECTOR, irq_move_cleanup_interrupt),
INTG(REBOOT_VECTOR, reboot_interrupt),
+#ifdef CONFIG_IPIPE
+ INTG(IPIPE_RESCHEDULE_VECTOR, ipipe_reschedule_interrupt),
+ INTG(IPIPE_CRITICAL_VECTOR, ipipe_critical_interrupt),
+#endif
#endif
#ifdef CONFIG_X86_THERMAL_VECTOR
@@ -144,6 +148,9 @@ static const __initconst struct idt_data apic_idts[] = {
#endif
INTG(SPURIOUS_APIC_VECTOR, spurious_interrupt),
INTG(ERROR_APIC_VECTOR, error_interrupt),
+#ifdef CONFIG_IPIPE
+ INTG(IPIPE_HRTIMER_VECTOR, ipipe_hrtimer_interrupt),
+#endif
#endif
};
@@ -308,9 +315,26 @@ void __init idt_setup_apic_and_irq_gates(void)
{
int i = FIRST_EXTERNAL_VECTOR;
void *entry;
+ unsigned int __maybe_unused cpu, ret;
idt_setup_from_table(idt_table, apic_idts, ARRAY_SIZE(apic_idts), true);
+#if defined(CONFIG_SMP) && defined(CONFIG_IPIPE)
+ /*
+ * The cleanup vector is not part of the system vector range
+ * but rather belongs to the external IRQ range, however we
+ * still need to map it early to a legit interrupt number for
+ * pipelining. Allocate a specific descriptor manually for it,
+ * using IRQ_MOVE_CLEANUP_VECTOR as both the vector number and
+ * interrupt number, so that we know the latter at build time.
+ */
+ ret = irq_alloc_descs(IRQ_MOVE_CLEANUP_VECTOR, 0, 1, 0);
+ BUG_ON(IRQ_MOVE_CLEANUP_VECTOR != ret);
+ for_each_possible_cpu(cpu)
+ per_cpu(vector_irq, cpu)[IRQ_MOVE_CLEANUP_VECTOR] =
+ irq_to_desc(IRQ_MOVE_CLEANUP_VECTOR);
+#endif
+
for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR) {
entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR);
set_intr_gate(i, entry);
diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
new file mode 100644
index 000000000000..300d5ee8187f
--- /dev/null
+++ b/arch/x86/kernel/ipipe.c
@@ -0,0 +1,560 @@
+/* -*- linux-c -*-
+ * linux/arch/x86/kernel/ipipe.c
+ *
+ * Copyright (C) 2002-2012 Philippe Gerum.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Architecture-dependent I-PIPE support for x86.
+ */
+
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sched/debug.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include <linux/clockchips.h>
+#include <linux/kprobes.h>
+#include <linux/mm.h>
+#include <linux/extable.h>
+#include <linux/ipipe_tickdev.h>
+#include <asm/asm-offsets.h>
+#include <asm/unistd.h>
+#include <asm/processor.h>
+#include <asm/atomic.h>
+#include <asm/hw_irq.h>
+#include <asm/irq.h>
+#include <asm/desc.h>
+#include <asm/io.h>
+#ifdef CONFIG_X86_LOCAL_APIC
+#include <asm/tlbflush.h>
+#include <asm/fixmap.h>
+#include <asm/bitops.h>
+#include <asm/mpspec.h>
+#ifdef CONFIG_X86_IO_APIC
+#include <asm/io_apic.h>
+#endif /* CONFIG_X86_IO_APIC */
+#include <asm/apic.h>
+#endif /* CONFIG_X86_LOCAL_APIC */
+#include <asm/fpu/internal.h>
+#include <asm/traps.h>
+#include <asm/tsc.h>
+#include <asm/mce.h>
+#include <asm/mmu_context.h>
+
+void smp_apic_timer_interrupt(struct pt_regs *regs);
+void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs);
+void smp_kvm_posted_intr_ipi(struct pt_regs *regs);
+void smp_spurious_interrupt(struct pt_regs *regs);
+void smp_error_interrupt(struct pt_regs *regs);
+void smp_x86_platform_ipi(struct pt_regs *regs);
+void smp_irq_work_interrupt(struct pt_regs *regs);
+void smp_reschedule_interrupt(struct pt_regs *regs);
+void smp_call_function_interrupt(struct pt_regs *regs);
+void smp_call_function_single_interrupt(struct pt_regs *regs);
+void smp_irq_move_cleanup_interrupt(void);
+void smp_reboot_interrupt(void);
+void smp_thermal_interrupt(struct pt_regs *regs);
+void smp_threshold_interrupt(struct pt_regs *regs);
+
+void __ipipe_do_IRQ(unsigned int irq, void *cookie);
+
+DEFINE_PER_CPU(unsigned long, __ipipe_cr2);
+EXPORT_PER_CPU_SYMBOL_GPL(__ipipe_cr2);
+
+int ipipe_get_sysinfo(struct ipipe_sysinfo *info)
+{
+ info->sys_nr_cpus = num_online_cpus();
+ info->sys_cpu_freq = __ipipe_cpu_freq;
+ info->sys_hrtimer_irq = per_cpu(ipipe_percpu.hrtimer_irq, 0);
+ info->sys_hrtimer_freq = __ipipe_hrtimer_freq;
+ info->sys_hrclock_freq = __ipipe_hrclock_freq;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ipipe_get_sysinfo);
+
+#ifdef CONFIG_X86_LOCAL_APIC
+
+static void __ipipe_noack_apic(struct irq_desc *desc)
+{
+}
+
+static void __ipipe_ack_apic(struct irq_desc *desc)
+{
+ __ack_APIC_irq();
+}
+
+#endif /* CONFIG_X86_LOCAL_APIC */
+
+/*
+ * __ipipe_enable_pipeline() -- We are running on the boot CPU, hw
+ * interrupts are off, and secondary CPUs are still lost in space.
+ */
+void __init __ipipe_enable_pipeline(void)
+{
+ unsigned int irq;
+
+#ifdef CONFIG_X86_LOCAL_APIC
+
+ /* Map the APIC system vectors. */
+
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(LOCAL_TIMER_VECTOR),
+ __ipipe_do_IRQ, smp_apic_timer_interrupt,
+ __ipipe_ack_apic);
+
+#ifdef CONFIG_HAVE_KVM
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(POSTED_INTR_WAKEUP_VECTOR),
+ __ipipe_do_IRQ, smp_kvm_posted_intr_wakeup_ipi,
+ __ipipe_ack_apic);
+
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(POSTED_INTR_VECTOR),
+ __ipipe_do_IRQ, smp_kvm_posted_intr_ipi,
+ __ipipe_ack_apic);
+#endif
+
+#if defined(CONFIG_X86_MCE_AMD) && defined(CONFIG_X86_64)
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(DEFERRED_ERROR_VECTOR),
+ __ipipe_do_IRQ, smp_deferred_error_interrupt,
+ __ipipe_ack_apic);
+#endif
+
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(SPURIOUS_APIC_VECTOR),
+ __ipipe_do_IRQ, smp_spurious_interrupt,
+ __ipipe_noack_apic);
+
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(ERROR_APIC_VECTOR),
+ __ipipe_do_IRQ, smp_error_interrupt,
+ __ipipe_ack_apic);
+
+#ifdef CONFIG_X86_THERMAL_VECTOR
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(THERMAL_APIC_VECTOR),
+ __ipipe_do_IRQ, smp_thermal_interrupt,
+ __ipipe_ack_apic);
+#endif /* CONFIG_X86_THERMAL_VECTOR */
+
+#ifdef CONFIG_X86_MCE_THRESHOLD
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(THRESHOLD_APIC_VECTOR),
+ __ipipe_do_IRQ, smp_threshold_interrupt,
+ __ipipe_ack_apic);
+#endif /* CONFIG_X86_MCE_THRESHOLD */
+
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(X86_PLATFORM_IPI_VECTOR),
+ __ipipe_do_IRQ, smp_x86_platform_ipi,
+ __ipipe_ack_apic);
+
+ /*
+ * We expose two high priority APIC vectors the head domain
+ * may use respectively for hires timing and SMP rescheduling.
+ * We should never receive them in the root domain.
+ */
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(IPIPE_HRTIMER_VECTOR),
+ __ipipe_do_IRQ, smp_spurious_interrupt,
+ __ipipe_ack_apic);
+
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(IPIPE_RESCHEDULE_VECTOR),
+ __ipipe_do_IRQ, smp_spurious_interrupt,
+ __ipipe_ack_apic);
+
+#ifdef CONFIG_IRQ_WORK
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(IRQ_WORK_VECTOR),
+ __ipipe_do_IRQ, smp_irq_work_interrupt,
+ __ipipe_ack_apic);
+#endif /* CONFIG_IRQ_WORK */
+
+#endif /* CONFIG_X86_LOCAL_APIC */
+
+#ifdef CONFIG_SMP
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(RESCHEDULE_VECTOR),
+ __ipipe_do_IRQ, smp_reschedule_interrupt,
+ __ipipe_ack_apic);
+
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(CALL_FUNCTION_VECTOR),
+ __ipipe_do_IRQ, smp_call_function_interrupt,
+ __ipipe_ack_apic);
+
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(CALL_FUNCTION_SINGLE_VECTOR),
+ __ipipe_do_IRQ, smp_call_function_single_interrupt,
+ __ipipe_ack_apic);
+
+ ipipe_request_irq(ipipe_root_domain,
+ IRQ_MOVE_CLEANUP_VECTOR,
+ __ipipe_do_IRQ, smp_irq_move_cleanup_interrupt,
+ __ipipe_ack_apic);
+
+ ipipe_request_irq(ipipe_root_domain,
+ ipipe_apic_vector_irq(REBOOT_VECTOR),
+ __ipipe_do_IRQ, smp_reboot_interrupt,
+ __ipipe_ack_apic);
+#endif /* CONFIG_SMP */
+
+ /*
+ * Finally, request the remaining ISA and IO-APIC
+ * interrupts. Interrupts which have already been requested
+ * will just beget a silent -EBUSY error, that's ok.
+ */
+ for (irq = 0; irq < IPIPE_NR_XIRQS; irq++)
+ ipipe_request_irq(ipipe_root_domain, irq,
+ __ipipe_do_IRQ, do_IRQ,
+ NULL);
+}
+
+#ifdef CONFIG_SMP
+int irq_activate(struct irq_desc *desc);
+
+int ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask)
+{
+ struct irq_desc *desc;
+ struct irq_chip *chip;
+ int err;
+
+ cpumask_and(&cpumask, &cpumask, cpu_online_mask);
+ if (cpumask_empty(&cpumask) || ipipe_virtual_irq_p(irq))
+ return -EINVAL;
+
+ desc = irq_to_desc(irq);
+ if (desc == NULL)
+ return -EINVAL;
+
+ chip = irq_desc_get_chip(desc);
+ if (chip->irq_set_affinity == NULL)
+ return -ENOSYS;
+
+ err = irq_activate(desc);
+ if (err)
+ return err;
+
+ chip->irq_set_affinity(irq_get_irq_data(irq), &cpumask, true);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ipipe_set_irq_affinity);
+
+void ipipe_send_ipi(unsigned int ipi, cpumask_t cpumask)
+{
+// unsigned long flags;
+
+// flags = hard_local_irq_save();
+
+ cpumask_clear_cpu(ipipe_processor_id(), &cpumask);
+ if (likely(!cpumask_empty(&cpumask)))
+ apic->send_IPI_mask(&cpumask, ipipe_apic_irq_vector(ipi));
+
+// hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ipipe_send_ipi);
+
+void __ipipe_hook_critical_ipi(struct ipipe_domain *ipd)
+{
+ unsigned int ipi = IPIPE_CRITICAL_IPI;
+
+ ipd->irqs[ipi].ackfn = __ipipe_ack_apic;
+ ipd->irqs[ipi].handler = __ipipe_do_critical_sync;
+ ipd->irqs[ipi].cookie = NULL;
+ ipd->irqs[ipi].control = IPIPE_HANDLE_MASK|IPIPE_STICKY_MASK;
+}
+
+#endif /* CONFIG_SMP */
+
+void __ipipe_halt_root(int use_mwait)
+{
+ struct ipipe_percpu_domain_data *p;
+
+ /* Emulate sti+hlt sequence over the root domain. */
+
+ hard_local_irq_disable();
+
+ p = ipipe_this_cpu_root_context();
+
+ trace_hardirqs_on();
+ __clear_bit(IPIPE_STALL_FLAG, &p->status);
+
+ if (unlikely(__ipipe_ipending_p(p))) {
+ __ipipe_sync_stage();
+ hard_local_irq_enable();
+ } else {
+#ifdef CONFIG_IPIPE_TRACE_IRQSOFF
+ ipipe_trace_end(0x8000000E);
+#endif /* CONFIG_IPIPE_TRACE_IRQSOFF */
+ if (use_mwait)
+ asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
+ :: "a" (0), "c" (0));
+ else
+ asm volatile("sti; hlt": : :"memory");
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_halt_root);
+
+static inline void __ipipe_fixup_if(bool stalled, struct pt_regs *regs)
+{
+ /*
+ * Have the saved hw state look like the domain stall bit, so
+ * that __ipipe_unstall_iret_root() restores the proper
+ * pipeline state for the root stage upon exit.
+ */
+ if (stalled)
+ regs->flags &= ~X86_EFLAGS_IF;
+ else
+ regs->flags |= X86_EFLAGS_IF;
+}
+
+dotraplinkage int __ipipe_trap_prologue(struct pt_regs *regs, int trapnr, unsigned long *flags)
+{
+ bool entry_irqs_off = hard_irqs_disabled();
+ struct ipipe_domain *ipd;
+ unsigned long cr2;
+
+ if (trapnr == X86_TRAP_PF)
+ cr2 = native_read_cr2();
+
+ /*
+ * KGDB and ftrace may poke int3/debug ops into the kernel
+ * code. Trap those exceptions early, do conditional fixups to
+ * the interrupt state depending on the current domain, let
+ * the regular handler see them.
+ */
+ if (unlikely(!user_mode(regs) &&
+ (trapnr == X86_TRAP_DB || trapnr == X86_TRAP_BP))) {
+
+ if (ipipe_root_p)
+ goto root_fixup;
+
+ /*
+ * Skip interrupt state fixup from the head domain,
+ * but do call the regular handler which is assumed to
+ * run fine within such context.
+ */
+ return -1;
+ }
+
+ /*
+ * Now that we have filtered out all debug traps which might
+ * happen anywhere in kernel code in theory, detect attempts
+ * to probe kernel memory (i.e. calls to probe_kernel_{read,
+ * write}()). If that happened over the head domain, do the
+ * fixup immediately then return right after upon success. If
+ * that fails, the kernel is likely to crash but let's follow
+ * the standard recovery procedure in that case anyway.
+ */
+ if (unlikely(!ipipe_root_p && faulthandler_disabled())) {
+ if (fixup_exception(regs, trapnr, regs->orig_ax, 0))
+ return 1;
+ }
+
+ if (unlikely(__ipipe_notify_trap(trapnr, regs)))
+ return 1;
+
+ if (likely(ipipe_root_p)) {
+ root_fixup:
+ /*
+ * If no head domain is installed, or in case we faulted in
+ * the iret path of x86-32, regs->flags does not match the root
+ * domain state. The fault handler may evaluate it. So fix this
+ * up with the current state.
+ */
+ local_save_flags(*flags);
+ __ipipe_fixup_if(raw_irqs_disabled_flags(*flags), regs);
+
+ /*
+ * Sync Linux interrupt state with hardware state on
+ * entry.
+ */
+ if (entry_irqs_off)
+ local_irq_disable();
+ } else {
+ /* Plan for restoring the original flags at fault. */
+ *flags = regs->flags;
+
+ /*
+ * Detect unhandled faults over the head domain,
+ * switching to root so that it can handle the fault
+ * cleanly.
+ */
+ hard_local_irq_disable();
+ ipd = __ipipe_current_domain;
+ __ipipe_set_current_domain(ipipe_root_domain);
+
+ ipipe_trace_panic_freeze();
+
+ /*
+ * Prevent warnings of this debug checker to focus on the
+ * actual bug.
+ */
+ if (test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status))
+ ipipe_context_check_off();
+
+ /* Sync Linux interrupt state with hardware state on entry. */
+ if (entry_irqs_off)
+ local_irq_disable();
+
+ /* Always warn about user land and unfixable faults. */
+ if (user_mode(regs) ||
+ !search_exception_tables(instruction_pointer(regs))) {
+ printk(KERN_ERR "BUG: Unhandled exception over domain"
+ " %s at 0x%lx - switching to ROOT\n",
+ ipd->name, instruction_pointer(regs));
+ dump_stack();
+ } else if (IS_ENABLED(CONFIG_IPIPE_DEBUG)) {
+ /* Also report fixable ones when debugging is enabled. */
+ printk(KERN_WARNING "WARNING: Fixable exception over "
+ "domain %s at 0x%lx - switching to ROOT\n",
+ ipd->name, instruction_pointer(regs));
+ dump_stack();
+ }
+ }
+
+ if (trapnr == X86_TRAP_PF)
+ write_cr2(cr2);
+
+ return 0;
+}
+
+dotraplinkage
+void __ipipe_trap_epilogue(struct pt_regs *regs,
+ unsigned long flags, unsigned long regs_flags)
+{
+ ipipe_restore_root(raw_irqs_disabled_flags(flags));
+ __ipipe_fixup_if(raw_irqs_disabled_flags(regs_flags), regs);
+}
+
+static inline int __ipipe_irq_from_vector(int vector, int *irq)
+{
+ struct irq_desc *desc;
+
+ if (vector >= FIRST_SYSTEM_VECTOR) {
+ *irq = ipipe_apic_vector_irq(vector);
+ return 0;
+ }
+
+ desc = __this_cpu_read(vector_irq[vector]);
+ if (likely(!IS_ERR_OR_NULL(desc))) {
+ *irq = irq_desc_get_irq(desc);
+ return 0;
+ }
+
+ if (vector == IRQ_MOVE_CLEANUP_VECTOR) {
+ *irq = vector;
+ return 0;
+ }
+
+#ifdef CONFIG_X86_LOCAL_APIC
+ __ack_APIC_irq();
+#endif
+
+ if (desc == VECTOR_UNUSED) {
+ pr_emerg_ratelimited("%s: %d.%d Unexpected IRQ trap\n",
+ __func__, smp_processor_id(), vector);
+ } else {
+ __this_cpu_write(vector_irq[vector], VECTOR_UNUSED);
+ }
+
+ return -1;
+}
+
+int __ipipe_handle_irq(struct pt_regs *regs)
+{
+ struct ipipe_percpu_data *p = __ipipe_raw_cpu_ptr(&ipipe_percpu);
+ int irq, vector = regs->orig_ax, flags = 0;
+ struct pt_regs *tick_regs;
+
+ if (likely(vector < 0)) {
+ if (__ipipe_irq_from_vector(~vector, &irq) < 0)
+ goto out;
+ } else { /* Software-generated. */
+ irq = vector;
+ flags = IPIPE_IRQF_NOACK;
+ }
+
+ ipipe_trace_irqbegin(irq, regs);
+
+ /*
+ * Given our deferred dispatching model for regular IRQs, we
+ * only record CPU regs for the last timer interrupt, so that
+ * the timer handler charges CPU times properly. It is assumed
+ * that no other interrupt handler cares for such information.
+ */
+ if (irq == p->hrtimer_irq || p->hrtimer_irq == -1) {
+ tick_regs = &p->tick_regs;
+ tick_regs->flags = regs->flags;
+ tick_regs->cs = regs->cs;
+ tick_regs->ip = regs->ip;
+ tick_regs->bp = regs->bp;
+#ifdef CONFIG_X86_64
+ tick_regs->ss = regs->ss;
+ tick_regs->sp = regs->sp;
+#endif
+ if (!__ipipe_root_p)
+ tick_regs->flags &= ~X86_EFLAGS_IF;
+ }
+
+ __ipipe_dispatch_irq(irq, flags);
+
+ if (user_mode(regs) && ipipe_test_thread_flag(TIP_MAYDAY))
+ __ipipe_call_mayday(regs);
+
+ ipipe_trace_irqend(irq, regs);
+
+out:
+ if (!__ipipe_root_p ||
+ test_bit(IPIPE_STALL_FLAG, &__ipipe_root_status))
+ return 0;
+
+ return 1;
+}
+
+void __ipipe_arch_share_current(int flags)
+{
+ struct task_struct *p = current;
+
+ /*
+ * Setup a clean extended FPU state for kernel threads.
+ */
+ if (p->mm == NULL)
+ memcpy(&p->thread.fpu.state,
+ &init_fpstate, fpu_kernel_xstate_size);
+}
+
+struct task_struct *__switch_to(struct task_struct *prev_p,
+ struct task_struct *next_p);
+EXPORT_SYMBOL_GPL(do_munmap);
+EXPORT_SYMBOL_GPL(__switch_to);
+EXPORT_SYMBOL_GPL(show_stack);
+
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+EXPORT_SYMBOL(tasklist_lock);
+#endif /* CONFIG_SMP || CONFIG_DEBUG_SPINLOCK */
+
+#if defined(CONFIG_CC_STACKPROTECTOR) && defined(CONFIG_X86_64)
+EXPORT_PER_CPU_SYMBOL_GPL(irq_stack_union);
+#endif
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 7dfd0185767c..7077388e4f1c 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -49,7 +49,7 @@ void ack_bad_irq(unsigned int irq)
* completely.
* But only ack when the APIC is enabled -AK
*/
- ack_APIC_irq();
+ __ack_APIC_irq();
}
#define irq_stats(x) (&per_cpu(irq_stat, x))
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 6b32ab009c19..16ea3c5aed6d 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -70,3 +70,30 @@ int irq_init_percpu_irqstack(unsigned int cpu)
return 0;
return map_irq_stack(cpu);
}
+
+#ifdef CONFIG_IPIPE
+
+void __ipipe_do_IRQ(unsigned int irq, void *cookie)
+{
+ struct pt_regs *regs = raw_cpu_ptr(&ipipe_percpu.tick_regs);
+ struct irq_desc *desc = irq_to_desc(irq);
+ struct pt_regs *old_regs = set_irq_regs(regs);
+ unsigned int (*handler)(struct pt_regs *regs);
+
+ handler = (typeof(handler))cookie;
+
+ __ipipe_move_root_irq(desc);
+
+ entering_irq();
+
+ if (handler == do_IRQ)
+ generic_handle_irq_desc(desc);
+ else
+ handler(regs);
+
+ exiting_irq();
+
+ set_irq_regs(old_regs);
+}
+
+#endif
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index c44fe7d8d9a4..c454c0312bd7 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -577,9 +577,9 @@ kgdb_notify(struct notifier_block *self, unsigned long cmd, void *ptr)
unsigned long flags;
int ret;
- local_irq_save(flags);
+ flags = hard_local_irq_save();
ret = __kgdb_notify(ptr, cmd);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
return ret;
}
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5bb001c0c771..50b7eff3f26b 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -488,6 +488,7 @@ static DEFINE_PER_CPU(unsigned long, nmi_cr2);
*/
static DEFINE_PER_CPU(int, update_debug_stack);
+#ifndef CONFIG_IPIPE
static bool notrace is_debug_stack(unsigned long addr)
{
struct cea_exception_stacks *cs = __this_cpu_read(cea_exception_stacks);
@@ -504,6 +505,9 @@ static bool notrace is_debug_stack(unsigned long addr)
return addr >= bot && addr < top;
}
NOKPROBE_SYMBOL(is_debug_stack);
+#else /* IPIPE */
+static bool notrace is_debug_stack(unsigned long addr) { return 0; }
+#endif
#endif
dotraplinkage notrace void
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b8de27bb6e09..0277c4d49fb9 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -116,8 +116,16 @@ void exit_thread(struct task_struct *tsk)
if (bp) {
struct tss_struct *tss = &per_cpu(cpu_tss_rw, get_cpu());
- t->io_bitmap_ptr = NULL;
+ /*
+ * The caller may be preempted via I-pipe: to make
+ * sure TIF_IO_BITMAP always denotes a valid I/O
+ * bitmap when set, we clear it _before_ the I/O
+ * bitmap pointer. No cache coherence issue ahead as
+ * migration is currently locked (the primary domain
+ * may never migrate either).
+ */
clear_thread_flag(TIF_IO_BITMAP);
+ t->io_bitmap_ptr = NULL;
/*
* Careful, clear this in the TSS too:
*/
@@ -426,7 +434,9 @@ static __always_inline void __speculation_ctrl_update(unsigned long tifp,
u64 msr = x86_spec_ctrl_base;
bool updmsr = false;
+#ifndef CONFIG_IPIPE
lockdep_assert_irqs_disabled();
+#endif
/* Handle change of TIF_SSBD depending on the mitigation method. */
if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) {
@@ -474,9 +484,9 @@ void speculation_ctrl_update(unsigned long tif)
unsigned long flags;
/* Forced update. Make sure all relevant TIF flags are different */
- local_irq_save(flags);
+ flags = hard_local_irq_save();
__speculation_ctrl_update(~tif, tif);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
/* Called from seccomp/prctl update */
@@ -589,7 +599,7 @@ bool xen_set_default_idle(void)
void stop_this_cpu(void *dummy)
{
- local_irq_disable();
+ hard_local_irq_disable();
/*
* Remove this CPU:
*/
@@ -689,7 +699,11 @@ static __cpuidle void mwait_idle(void)
__monitor((void *)¤t_thread_info()->flags, 0, 0);
if (!need_resched())
+#ifdef CONFIG_IPIPE
+ __ipipe_halt_root(1);
+#else
__sti_mwait(0, 0);
+#endif
else
local_irq_enable();
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
@@ -749,6 +763,10 @@ void __init arch_post_acpi_subsys_init(void)
if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
mark_tsc_unstable("TSC halt in AMD C1E");
pr_info("System has AMD C1E enabled\n");
+#ifdef CONFIG_IPIPE
+ pr_info("I-pipe: will not be able to use LAPIC as a tick device\n"
+ "I-pipe: disable C1E power state in your BIOS\n");
+#endif
}
static int __init idle_setup(char *str)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b77f0d9dad55..e3022e782001 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -505,7 +505,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
{
struct thread_struct *prev = &prev_p->thread;
struct thread_struct *next = &next_p->thread;
- int cpu = smp_processor_id();
+ int cpu = raw_smp_processor_id();
WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
this_cpu_read(irq_count) != -1);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 31ee37f87b2b..57c4de9fc9fc 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -213,10 +213,10 @@ static void native_stop_other_cpus(int wait)
udelay(1);
}
- local_irq_save(flags);
+ flags = hard_local_irq_save();
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
/*
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 1699d18bd154..681dc7c6e1bf 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1137,7 +1137,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
{
int apicid = apic->cpu_present_to_apicid(cpu);
int cpu0_nmi_registered = 0;
- unsigned long flags;
+ unsigned long vflags, rflags;
int err, ret = 0;
lockdep_assert_irqs_enabled();
@@ -1188,9 +1188,11 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
* Check TSC synchronization with the AP (keep irqs disabled
* while doing so):
*/
- local_irq_save(flags);
+ local_irq_save(vflags);
+ rflags = hard_local_irq_save();
check_tsc_sync_source(cpu);
- local_irq_restore(flags);
+ hard_local_irq_restore(rflags);
+ local_irq_restore(vflags);
while (!cpu_online(cpu)) {
cpu_relax();
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4bb0f8447112..e36568748768 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -14,6 +14,7 @@
#include <linux/context_tracking.h>
#include <linux/interrupt.h>
+#include <linux/ipipe.h>
#include <linux/kallsyms.h>
#include <linux/spinlock.h>
#include <linux/kprobes.h>
@@ -77,13 +78,13 @@ DECLARE_BITMAP(system_vectors, NR_VECTORS);
static inline void cond_local_irq_enable(struct pt_regs *regs)
{
if (regs->flags & X86_EFLAGS_IF)
- local_irq_enable();
+ hard_local_irq_enable_notrace();
}
static inline void cond_local_irq_disable(struct pt_regs *regs)
{
if (regs->flags & X86_EFLAGS_IF)
- local_irq_disable();
+ hard_local_irq_disable_notrace();
}
/*
@@ -529,7 +530,7 @@ do_general_protection(struct pt_regs *regs, long error_code)
}
if (v8086_mode(regs)) {
- local_irq_enable();
+ hard_local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
return;
}
@@ -912,7 +913,7 @@ NOKPROBE_SYMBOL(do_device_not_available);
dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
- local_irq_enable();
+ hard_local_irq_enable();
if (notify_die(DIE_TRAP, "iret exception", regs, error_code,
X86_TRAP_IRET, SIGILL) != NOTIFY_STOP) {
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index fe4200b89582..1c235369847c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -752,11 +752,11 @@ static unsigned long pit_hpet_ptimer_calibrate_cpu(void)
* calibration, which will take at least 50ms, and
* read the end value.
*/
- local_irq_save(flags);
+ flags = hard_local_irq_save();
tsc1 = tsc_read_refs(&ref1, hpet);
tsc_pit_khz = pit_calibrate_tsc(latch, ms, loopmin);
tsc2 = tsc_read_refs(&ref2, hpet);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
/* Pick the lowest PIT TSC calibration so far */
tsc_pit_min = min(tsc_pit_min, tsc_pit_khz);
@@ -865,9 +865,9 @@ unsigned long native_calibrate_cpu_early(void)
if (!fast_calibrate)
fast_calibrate = cpu_khz_from_msr();
if (!fast_calibrate) {
- local_irq_save(flags);
+ flags = hard_local_irq_save();
fast_calibrate = quick_pit_calibrate();
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
return fast_calibrate;
}
@@ -1130,7 +1130,7 @@ static struct clocksource clocksource_tsc_early = {
* this one will immediately take over. We will only register if TSC has
* been found good.
*/
-static struct clocksource clocksource_tsc = {
+struct clocksource clocksource_tsc = {
.name = "tsc",
.rating = 300,
.read = read_tsc,
@@ -1317,6 +1317,7 @@ static void tsc_refine_calibration_work(struct work_struct *work)
u64 tsc_stop, ref_stop, delta;
unsigned long freq;
int cpu;
+ unsigned int ipipe_freq;
/* Don't bother refining TSC on unstable systems */
if (tsc_unstable)
@@ -1368,6 +1369,9 @@ restart:
/* Inform the TSC deadline clockevent devices about the recalibration */
lapic_update_tsc_freq();
+ ipipe_freq = tsc_khz * 1000;
+ __ipipe_report_clockfreq_update(ipipe_freq);
+
/* Update the sched_clock() rate to match the clocksource one */
for_each_possible_cpu(cpu)
set_cyc2ns_scale(tsc_khz, cpu, tsc_stop);
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index a76c12b38e92..d28e32e79092 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -147,12 +147,14 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
}
preempt_disable();
+ hard_cond_local_irq_disable();
tsk->thread.sp0 = vm86->saved_sp0;
tsk->thread.sysenter_cs = __KERNEL_CS;
update_task_stack(tsk);
refresh_sysenter_cs(&tsk->thread);
vm86->saved_sp0 = 0;
preempt_enable();
+ hard_cond_local_irq_enable();
memcpy(®s->pt, &vm86->regs32, sizeof(struct pt_regs));
@@ -365,6 +367,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
vm86->saved_sp0 = tsk->thread.sp0;
lazy_save_gs(vm86->regs32.gs);
+ hard_cond_local_irq_disable();
/* make room for real-mode segments */
preempt_disable();
tsk->thread.sp0 += 16;
@@ -376,6 +379,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
update_task_stack(tsk);
preempt_enable();
+ hard_cond_local_irq_enable();
if (vm86->flags & VM86_SCREEN_BITMAP)
mark_screen_rdonly(tsk->mm);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1a9fa2903852..faacb12bcc66 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1105,23 +1105,29 @@ static void fetch_register_operand(struct operand *op)
}
}
-static void emulator_get_fpu(void)
+static unsigned long emulator_get_fpu(void)
{
- fpregs_lock();
+ unsigned long flags;
+
+ flags = fpregs_lock();
fpregs_assert_state_consistent();
if (test_thread_flag(TIF_NEED_FPU_LOAD))
switch_fpu_return();
+
+ return flags;
}
-static void emulator_put_fpu(void)
+static void emulator_put_fpu(unsigned long flags)
{
- fpregs_unlock();
+ fpregs_unlock(flags);
}
static void read_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data, int reg)
{
- emulator_get_fpu();
+ unsigned long flags;
+
+ flags = emulator_get_fpu();
switch (reg) {
case 0: asm("movdqa %%xmm0, %0" : "=m"(*data)); break;
case 1: asm("movdqa %%xmm1, %0" : "=m"(*data)); break;
@@ -1143,13 +1149,15 @@ static void read_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data, int reg)
#endif
default: BUG();
}
- emulator_put_fpu();
+ emulator_put_fpu(flags);
}
static void write_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data,
int reg)
{
- emulator_get_fpu();
+ unsigned long flags;
+
+ flags = emulator_get_fpu();
switch (reg) {
case 0: asm("movdqa %0, %%xmm0" : : "m"(*data)); break;
case 1: asm("movdqa %0, %%xmm1" : : "m"(*data)); break;
@@ -1171,12 +1179,14 @@ static void write_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data,
#endif
default: BUG();
}
- emulator_put_fpu();
+ emulator_put_fpu(flags);
}
static void read_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
{
- emulator_get_fpu();
+ unsigned long flags;
+
+ flags = emulator_get_fpu();
switch (reg) {
case 0: asm("movq %%mm0, %0" : "=m"(*data)); break;
case 1: asm("movq %%mm1, %0" : "=m"(*data)); break;
@@ -1188,12 +1198,14 @@ static void read_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
case 7: asm("movq %%mm7, %0" : "=m"(*data)); break;
default: BUG();
}
- emulator_put_fpu();
+ emulator_put_fpu(flags);
}
static void write_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
{
- emulator_get_fpu();
+ unsigned long flags;
+
+ flags = emulator_get_fpu();
switch (reg) {
case 0: asm("movq %0, %%mm0" : : "m"(*data)); break;
case 1: asm("movq %0, %%mm1" : : "m"(*data)); break;
@@ -1205,30 +1217,33 @@ static void write_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
case 7: asm("movq %0, %%mm7" : : "m"(*data)); break;
default: BUG();
}
- emulator_put_fpu();
+ emulator_put_fpu(flags);
}
static int em_fninit(struct x86_emulate_ctxt *ctxt)
{
+ unsigned long flags;
+
if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
return emulate_nm(ctxt);
- emulator_get_fpu();
+ flags = emulator_get_fpu();
asm volatile("fninit");
- emulator_put_fpu();
+ emulator_put_fpu(flags);
return X86EMUL_CONTINUE;
}
static int em_fnstcw(struct x86_emulate_ctxt *ctxt)
{
+ unsigned long flags;
u16 fcw;
if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
return emulate_nm(ctxt);
- emulator_get_fpu();
+ flags = emulator_get_fpu();
asm volatile("fnstcw %0": "+m"(fcw));
- emulator_put_fpu();
+ emulator_put_fpu(flags);
ctxt->dst.val = fcw;
@@ -1237,14 +1252,15 @@ static int em_fnstcw(struct x86_emulate_ctxt *ctxt)
static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
{
+ unsigned long flags;
u16 fsw;
if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
return emulate_nm(ctxt);
- emulator_get_fpu();
+ flags = emulator_get_fpu();
asm volatile("fnstsw %0": "+m"(fsw));
- emulator_put_fpu();
+ emulator_put_fpu(flags);
ctxt->dst.val = fsw;
@@ -4172,17 +4188,18 @@ static inline size_t fxstate_size(struct x86_emulate_ctxt *ctxt)
static int em_fxsave(struct x86_emulate_ctxt *ctxt)
{
struct fxregs_state fx_state;
+ unsigned long flags;
int rc;
rc = check_fxsr(ctxt);
if (rc != X86EMUL_CONTINUE)
return rc;
- emulator_get_fpu();
+ flags = emulator_get_fpu();
rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state));
- emulator_put_fpu();
+ emulator_put_fpu(flags);
if (rc != X86EMUL_CONTINUE)
return rc;
@@ -4214,6 +4231,7 @@ static noinline int fxregs_fixup(struct fxregs_state *fx_state,
static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
{
struct fxregs_state fx_state;
+ unsigned long flags;
int rc;
size_t size;
@@ -4226,7 +4244,7 @@ static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;
- emulator_get_fpu();
+ flags = emulator_get_fpu();
if (size < __fxstate_size(16)) {
rc = fxregs_fixup(&fx_state, size);
@@ -4243,7 +4261,7 @@ static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
rc = asm_safe("fxrstor %[fx]", : [fx] "m"(fx_state));
out:
- emulator_put_fpu();
+ emulator_put_fpu(flags);
return rc;
}
@@ -5558,11 +5576,12 @@ static bool string_insn_completed(struct x86_emulate_ctxt *ctxt)
static int flush_pending_x87_faults(struct x86_emulate_ctxt *ctxt)
{
+ unsigned long flags;
int rc;
- emulator_get_fpu();
+ flags = emulator_get_fpu();
rc = asm_safe("fwait");
- emulator_put_fpu();
+ emulator_put_fpu(flags);
if (unlikely(rc != X86EMUL_CONTINUE))
return emulate_exception(ctxt, MF_VECTOR, 0, false);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c5a9de8d0725..2818dbf4e7b7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5716,7 +5716,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
*/
x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
- local_irq_enable();
+ hard_local_irq_enable();
asm volatile (
"push %%" _ASM_BP "; \n\t"
@@ -5840,7 +5840,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
reload_tss(vcpu);
- local_irq_disable();
+ hard_local_irq_disable();
x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e6dd6a7e8689..dcf03db5d40e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1293,19 +1293,23 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
#ifdef CONFIG_X86_64
static u64 vmx_read_guest_kernel_gs_base(struct vcpu_vmx *vmx)
{
- preempt_disable();
+ unsigned long flags;
+
+ flags = hard_preempt_disable();
if (vmx->guest_state_loaded)
rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
- preempt_enable();
+ hard_preempt_enable(flags);
return vmx->msr_guest_kernel_gs_base;
}
static void vmx_write_guest_kernel_gs_base(struct vcpu_vmx *vmx, u64 data)
{
- preempt_disable();
+ unsigned long flags;
+
+ flags = hard_preempt_disable();
if (vmx->guest_state_loaded)
wrmsrl(MSR_KERNEL_GS_BASE, data);
- preempt_enable();
+ hard_preempt_enable(flags);
vmx->msr_guest_kernel_gs_base = data;
}
#endif
@@ -1745,6 +1749,7 @@ static void setup_msrs(struct vcpu_vmx *vmx)
{
int save_nmsrs, index;
+ hard_cond_local_irq_disable();
save_nmsrs = 0;
#ifdef CONFIG_X86_64
/*
@@ -1772,6 +1777,7 @@ static void setup_msrs(struct vcpu_vmx *vmx)
vmx->save_nmsrs = save_nmsrs;
vmx->guest_msrs_ready = false;
+ hard_cond_local_irq_enable();
if (cpu_has_vmx_msr_bitmap())
vmx_update_msr_bitmap(&vmx->vcpu);
@@ -2246,9 +2252,22 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
u64 old_msr_data = msr->data;
msr->data = data;
if (msr - vmx->guest_msrs < vmx->save_nmsrs) {
+ unsigned long flags;
+
preempt_disable();
+ flags = hard_cond_local_irq_save();
+ /*
+ * This may be called without a ipipe notifier
+ * registered, i.e. outside of vcpu_run. In
+ * that case, shared MSRs may be set to guest
+ * state while I-pipe will have no chance to
+ * restore them when interrupting afterwards.
+ * Therefore register the notifier.
+ */
+ __ipipe_enter_vm(&vcpu->ipipe_notifier);
ret = kvm_set_shared_msr(msr->index, msr->data,
msr->mask);
+ hard_cond_local_irq_restore(flags);
preempt_enable();
if (ret)
msr->data = old_msr_data;
@@ -6848,7 +6867,9 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
vmx_vcpu_load(&vmx->vcpu, cpu);
vmx->vcpu.cpu = cpu;
vmx_vcpu_setup(vmx);
+ hard_cond_local_irq_disable();
vmx_vcpu_put(&vmx->vcpu);
+ hard_cond_local_irq_enable();
put_cpu();
if (cpu_need_virtualize_apic_accesses(&vmx->vcpu)) {
err = alloc_apic_access_page(kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f5e9590a8f31..fc87893224f9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -39,6 +39,7 @@
#include <linux/iommu.h>
#include <linux/intel-iommu.h>
#include <linux/cpufreq.h>
+#include <linux/ipipe.h>
#include <linux/user-return-notifier.h>
#include <linux/srcu.h>
#include <linux/slab.h>
@@ -171,6 +172,7 @@ struct kvm_shared_msrs_global {
struct kvm_shared_msrs {
struct user_return_notifier urn;
bool registered;
+ bool dirty;
struct kvm_shared_msr_values {
u64 host;
u64 curr;
@@ -236,12 +238,31 @@ static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
vcpu->arch.apf.gfns[i] = ~0;
}
+static void kvm_restore_shared_msrs(struct kvm_shared_msrs *locals)
+{
+ struct kvm_shared_msr_values *values;
+ unsigned long flags;
+ unsigned int slot;
+
+ flags = hard_cond_local_irq_save();
+ if (locals->dirty) {
+ for (slot = 0; slot < shared_msrs_global.nr; ++slot) {
+ values = &locals->values[slot];
+ if (values->host != values->curr) {
+ wrmsrl(shared_msrs_global.msrs[slot],
+ values->host);
+ values->curr = values->host;
+ }
+ }
+ locals->dirty = false;
+ }
+ hard_cond_local_irq_restore(flags);
+}
+
static void kvm_on_user_return(struct user_return_notifier *urn)
{
- unsigned slot;
struct kvm_shared_msrs *locals
= container_of(urn, struct kvm_shared_msrs, urn);
- struct kvm_shared_msr_values *values;
unsigned long flags;
/*
@@ -254,13 +275,8 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
user_return_notifier_unregister(urn);
}
local_irq_restore(flags);
- for (slot = 0; slot < shared_msrs_global.nr; ++slot) {
- values = &locals->values[slot];
- if (values->host != values->curr) {
- wrmsrl(shared_msrs_global.msrs[slot], values->host);
- values->curr = values->host;
- }
- }
+ kvm_restore_shared_msrs(locals);
+ __ipipe_exit_vm();
}
static void shared_msr_update(unsigned slot, u32 msr)
@@ -311,6 +327,7 @@ int kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
return 1;
smsr->values[slot].curr = value;
+ smsr->dirty = true;
if (!smsr->registered) {
smsr->urn.on_user_return = kvm_on_user_return;
user_return_notifier_register(&smsr->urn);
@@ -3582,11 +3599,23 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
+ unsigned long flags;
int idx;
if (vcpu->preempted)
vcpu->arch.preempted_in_kernel = !kvm_x86_ops->get_cpl(vcpu);
+ flags = hard_cond_local_irq_save();
+
+ /*
+ * Do not update steal time accounting while running over the head
+ * domain as this may introduce high latencies and will also issue
+ * context violation reports. The code will be executed when kvm does
+ * the regular kvm_arch_vcpu_put, after returning from the head domain.
+ */
+ if (!ipipe_root_p)
+ goto skip_steal_time_update;
+
/*
* Disable page faults because we're in atomic context here.
* kvm_write_guest_offset_cached() would call might_fault()
@@ -3604,6 +3633,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_steal_time_set_preempted(vcpu);
srcu_read_unlock(&vcpu->kvm->srcu, idx);
pagefault_enable();
+skip_steal_time_update:
kvm_x86_ops->vcpu_put(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
/*
@@ -3612,7 +3642,42 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
* guest. do_debug expects dr6 to be cleared after it runs, do the same.
*/
set_debugreg(0, 6);
+
+#ifdef CONFIG_IPIPE
+ vcpu->ipipe_put_vcpu = false;
+ if (!per_cpu_ptr(shared_msrs, smp_processor_id())->dirty)
+ __ipipe_exit_vm();
+#endif
+
+ hard_cond_local_irq_restore(flags);
+}
+
+#ifdef CONFIG_IPIPE
+
+void __ipipe_handle_vm_preemption(struct ipipe_vm_notifier *nfy)
+{
+ unsigned int cpu = raw_smp_processor_id();
+ struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu);
+ struct kvm_vcpu *vcpu;
+
+ vcpu = container_of(nfy, struct kvm_vcpu, ipipe_notifier);
+
+ /*
+ * We may leave kvm_arch_vcpu_put with the ipipe notifier still
+ * registered in case shared MSRs are still active. If a VM preemption
+ * hits us after that point but before the user return notifier fired,
+ * we may run kvm_arch_vcpu_put again from here. Do not rely on this
+ * being harmless and rather use a flag to decide if the run is needed.
+ */
+ if (vcpu->ipipe_put_vcpu)
+ kvm_arch_vcpu_put(vcpu);
+
+ kvm_restore_shared_msrs(smsr);
+ __ipipe_exit_vm();
}
+EXPORT_SYMBOL_GPL(__ipipe_handle_vm_preemption);
+
+#endif
static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
@@ -8235,6 +8300,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
}
preempt_disable();
+ local_irq_disable();
+ hard_cond_local_irq_disable();
+
+#ifdef CONFIG_IPIPE
+ __ipipe_enter_vm(&vcpu->ipipe_notifier);
+ vcpu->ipipe_put_vcpu = true;
+#endif
kvm_x86_ops->prepare_guest_switch(vcpu);
@@ -8243,7 +8315,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
* IPI are then delayed after guest entry, which ensures that they
* result in virtual interrupt delivery.
*/
- local_irq_disable();
vcpu->mode = IN_GUEST_MODE;
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
@@ -8273,6 +8344,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
|| need_resched() || signal_pending(current)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb();
+ hard_cond_local_irq_enable();
local_irq_enable();
preempt_enable();
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
@@ -8341,6 +8413,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_x86_ops->handle_exit_irqoff(vcpu);
+ hard_cond_local_irq_enable();
+
/*
* Consume any pending interrupts, including the possible source of
* VM-Exit on SVM and any ticks that occur between VM-Exit and now.
@@ -8583,7 +8657,9 @@ static void kvm_save_current_fpu(struct fpu *fpu)
/* Swap (qemu) user FPU context for the guest FPU context. */
static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
- fpregs_lock();
+ unsigned long flags;
+
+ flags = fpregs_lock();
kvm_save_current_fpu(vcpu->arch.user_fpu);
@@ -8592,7 +8668,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
~XFEATURE_MASK_PKRU);
fpregs_mark_activate();
- fpregs_unlock();
+ fpregs_unlock(flags);
trace_kvm_fpu(1);
}
@@ -8600,14 +8676,16 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
/* When vcpu_run ends, restore user space FPU context. */
static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
{
- fpregs_lock();
+ unsigned long flags;
+
+ flags = fpregs_lock();
kvm_save_current_fpu(vcpu->arch.guest_fpu);
copy_kernel_to_fpregs(&vcpu->arch.user_fpu->state);
fpregs_mark_activate();
- fpregs_unlock();
+ fpregs_unlock(flags);
++vcpu->stat.fpu_reload;
trace_kvm_fpu(0);
@@ -9217,6 +9295,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
"guest TSC will not be reliable\n");
vcpu = kvm_x86_ops->vcpu_create(kvm, id);
+#ifdef CONFIG_IPIPE
+ vcpu->ipipe_notifier.handler = __ipipe_handle_vm_preemption;
+#endif
return vcpu;
}
diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 419365c48b2a..4a990d5b30a8 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -41,7 +41,7 @@ void *_mmx_memcpy(void *to, const void *from, size_t len)
void *p;
int i;
- if (unlikely(in_interrupt()))
+ if (unlikely(!ipipe_root_p || in_interrupt()))
return __memcpy(to, from, len);
p = to;
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index 3f435d7fca5e..1168c90acd88 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -5,6 +5,7 @@
*/
#include <linux/uaccess.h>
+#include <linux/ipipe.h>
#include <linux/export.h>
#include <asm/tlbflush.h>
@@ -18,7 +19,7 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
{
unsigned long ret;
- if (__range_not_ok(from, n, TASK_SIZE))
+ if (!ipipe_root_p || __range_not_ok(from, n, TASK_SIZE))
return n;
if (!nmi_uaccess_okay())
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c494c8c05824..0012982fb176 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1516,6 +1516,12 @@ static noinline void
__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
unsigned long address)
{
+#ifdef CONFIG_IPIPE
+ if (ipipe_root_domain != ipipe_head_domain) {
+ trace_hardirqs_on();
+ hard_local_irq_enable();
+ }
+#endif
prefetchw(¤t->mm->mmap_sem);
if (unlikely(kmmio_fault(regs, address)))
@@ -1553,3 +1559,50 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long addr
exception_exit(prev_state);
}
NOKPROBE_SYMBOL(do_page_fault);
+
+#ifdef CONFIG_IPIPE
+
+void __ipipe_pin_mapping_globally(unsigned long start, unsigned long end)
+{
+#ifdef CONFIG_X86_32
+ unsigned long next, addr = start;
+
+ do {
+ unsigned long flags;
+ struct page *page;
+
+ next = pgd_addr_end(addr, end);
+ spin_lock_irqsave(&pgd_lock, flags);
+ list_for_each_entry(page, &pgd_list, lru)
+ vmalloc_sync_one(page_address(page), addr);
+ spin_unlock_irqrestore(&pgd_lock, flags);
+
+ } while (addr = next, addr != end);
+#else
+ unsigned long next, addr = start;
+ pgd_t *pgd, *pgd_ref;
+ struct page *page;
+
+ if (!(start >= VMALLOC_START && start < VMALLOC_END))
+ return;
+
+ do {
+ next = pgd_addr_end(addr, end);
+ pgd_ref = pgd_offset_k(addr);
+ if (pgd_none(*pgd_ref))
+ continue;
+ spin_lock(&pgd_lock);
+ list_for_each_entry(page, &pgd_list, lru) {
+ pgd = page_address(page) + pgd_index(addr);
+ if (pgd_none(*pgd))
+ set_pgd(pgd, *pgd_ref);
+ }
+ spin_unlock(&pgd_lock);
+ addr = next;
+ } while (addr != end);
+
+ arch_flush_lazy_mmu_mode();
+#endif
+}
+
+#endif /* CONFIG_IPIPE */
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index c7c4e2f8c6a5..ad2d7c849eec 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -772,6 +772,7 @@ void io_free_memtype(resource_size_t start, resource_size_t end)
free_memtype(start, end);
}
+#ifdef CONFIG_X86_PAT
int arch_io_reserve_memtype_wc(resource_size_t start, resource_size_t size)
{
enum page_cache_mode type = _PAGE_CACHE_MODE_WC;
@@ -785,6 +786,7 @@ void arch_io_free_memtype_wc(resource_size_t start, resource_size_t size)
io_free_memtype(start, start + size);
}
EXPORT_SYMBOL(arch_io_free_memtype_wc);
+#endif
pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t vma_prot)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 851359b7edc5..a615642062e5 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -156,9 +156,9 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
{
unsigned long flags;
- local_irq_save(flags);
+ flags = hard_local_irq_save();
switch_mm_irqs_off(prev, next, tsk);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
static void sync_current_stack_to_mm(struct mm_struct *mm)
@@ -278,7 +278,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
bool was_lazy = this_cpu_read(cpu_tlbstate.is_lazy);
- unsigned cpu = smp_processor_id();
+ unsigned cpu = raw_smp_processor_id();
u64 next_tlb_gen;
bool need_flush;
u16 new_asid;
@@ -292,8 +292,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
* NB: leave_mm() calls us with prev == NULL and tsk == NULL.
*/
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_IPIPE_DEBUG_INTERNAL) &&
+ !hard_irqs_disabled());
+
/* We don't want flush_tlb_func_* to run concurrently with us. */
- if (IS_ENABLED(CONFIG_PROVE_LOCKING))
+ if (!IS_ENABLED(CONFIG_IPIPE) && IS_ENABLED(CONFIG_PROVE_LOCKING))
WARN_ON_ONCE(!irqs_disabled());
/*
@@ -536,16 +539,27 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
* - f->new_tlb_gen: the generation that the requester of the flush
* wants us to catch up to.
*/
- struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
- u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
- u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
- u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
+ struct mm_struct *loaded_mm;
+ u32 loaded_mm_asid;
+ u64 mm_tlb_gen;
+ u64 local_tlb_gen;
+ unsigned long flags;
/* This code cannot presently handle being reentered. */
VM_WARN_ON(!irqs_disabled());
- if (unlikely(loaded_mm == &init_mm))
+ flags = hard_cond_local_irq_save();
+
+ loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+ loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+ mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
+ loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+ local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
+
+ if (unlikely(loaded_mm == &init_mm)) {
+ hard_cond_local_irq_restore(flags);
return;
+ }
VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) !=
loaded_mm->context.ctx_id);
@@ -561,10 +575,12 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
* IPIs to lazy TLB mode CPUs.
*/
switch_mm_irqs_off(NULL, &init_mm, NULL);
+ hard_cond_local_irq_restore(flags);
return;
}
if (unlikely(local_tlb_gen == mm_tlb_gen)) {
+ hard_cond_local_irq_restore(flags);
/*
* There's nothing to do: we're already up to date. This can
* happen if two concurrent flushes happen -- the first flush to
@@ -578,6 +594,8 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
WARN_ON_ONCE(local_tlb_gen > mm_tlb_gen);
WARN_ON_ONCE(f->new_tlb_gen > mm_tlb_gen);
+ hard_cond_local_irq_restore(flags);
+
/*
* If we get to this point, we know that our TLB is out of date.
* This does not strictly imply that we need to flush (it's
@@ -637,8 +655,13 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
trace_tlb_flush(reason, TLB_FLUSH_ALL);
}
- /* Both paths above update our state to mm_tlb_gen. */
- this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
+ flags = hard_cond_local_irq_save();
+ if (loaded_mm == this_cpu_read(cpu_tlbstate.loaded_mm) &&
+ loaded_mm_asid == this_cpu_read(cpu_tlbstate.loaded_mm_asid)) {
+ /* Both paths above update our state to mm_tlb_gen. */
+ this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
+ }
+ hard_cond_local_irq_restore(flags);
}
static void flush_tlb_func_local(const void *info, enum tlb_flush_reason reason)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 1b016fdd1a75..f63e7bcedcaf 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3348,6 +3348,17 @@ EXPORT_SYMBOL(dev_printk_emit);
static void __dev_printk(const char *level, const struct device *dev,
struct va_format *vaf)
{
+#ifdef CONFIG_IPIPE
+ /*
+ * Console logging only if hard locked, or over the head
+ * stage.
+ */
+ if (hard_irqs_disabled() || !ipipe_root_p) {
+ __ipipe_log_printk(vaf->fmt, *vaf->va);
+ return;
+ }
+#endif
+
if (dev)
dev_printk_emit(level[1] - '0', dev, "%s %s: %pV",
dev_driver_string(dev), dev_name(dev), vaf);
diff --git a/drivers/base/regmap/regmap-irq.c b/drivers/base/regmap/regmap-irq.c
index 3c1e554df4eb..2bcf7c5adfbb 100644
--- a/drivers/base/regmap/regmap-irq.c
+++ b/drivers/base/regmap/regmap-irq.c
@@ -216,6 +216,7 @@ static void regmap_irq_enable(struct irq_data *data)
const struct regmap_irq *irq_data = irq_to_regmap_irq(d, data->hwirq);
unsigned int reg = irq_data->reg_offset / map->reg_stride;
unsigned int mask, type;
+ unsigned long flags;
type = irq_data->type.type_falling_val | irq_data->type.type_rising_val;
@@ -238,7 +239,9 @@ static void regmap_irq_enable(struct irq_data *data)
if (d->chip->clear_on_unmask)
d->clear_status = true;
+ flags = hard_cond_local_irq_save();
d->mask_buf[reg] &= ~mask;
+ hard_cond_local_irq_restore(flags);
}
static void regmap_irq_disable(struct irq_data *data)
@@ -246,8 +249,11 @@ static void regmap_irq_disable(struct irq_data *data)
struct regmap_irq_chip_data *d = irq_data_get_irq_chip_data(data);
struct regmap *map = d->map;
const struct regmap_irq *irq_data = irq_to_regmap_irq(d, data->hwirq);
+ unsigned long flags;
+ flags = hard_cond_local_irq_save();
d->mask_buf[irq_data->reg_offset / map->reg_stride] |= irq_data->mask;
+ hard_cond_local_irq_restore(flags);
}
static int regmap_irq_set_type(struct irq_data *data, unsigned int type)
@@ -325,6 +331,7 @@ static const struct irq_chip regmap_irq_chip = {
.irq_enable = regmap_irq_enable,
.irq_set_type = regmap_irq_set_type,
.irq_set_wake = regmap_irq_set_wake,
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static inline int read_sub_irq_data(struct regmap_irq_chip_data *data,
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index ec6f28ed21e2..a4c52d10ada3 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -17,6 +17,8 @@
#include <linux/clockchips.h>
#include <linux/clocksource.h>
#include <linux/interrupt.h>
+#include <linux/ipipe.h>
+#include <linux/ipipe_tickdev.h>
#include <linux/of_irq.h>
#include <linux/of_address.h>
#include <linux/io.h>
@@ -631,8 +633,7 @@ static bool arch_timer_counter_has_wa(void)
#define arch_timer_counter_has_wa() ({false;})
#endif /* CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND */
-static __always_inline irqreturn_t timer_handler(const int access,
- struct clock_event_device *evt)
+static int arch_timer_ack(const int access, struct clock_event_device *evt)
{
unsigned long ctrl;
@@ -640,6 +641,52 @@ static __always_inline irqreturn_t timer_handler(const int access,
if (ctrl & ARCH_TIMER_CTRL_IT_STAT) {
ctrl |= ARCH_TIMER_CTRL_IT_MASK;
arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, evt);
+ return 1;
+ }
+ return 0;
+}
+
+#ifdef CONFIG_IPIPE
+static DEFINE_PER_CPU(struct ipipe_timer, arch_itimer);
+static struct __ipipe_tscinfo tsc_info = {
+ .type = IPIPE_TSC_TYPE_FREERUNNING_ARCH,
+ .u = {
+ {
+ .mask = 0xffffffffffffffff,
+ },
+ },
+};
+
+static void arch_itimer_ack_phys(void)
+{
+ struct clock_event_device *evt = this_cpu_ptr(arch_timer_evt);
+ arch_timer_ack(ARCH_TIMER_PHYS_ACCESS, evt);
+}
+
+static void arch_itimer_ack_virt(void)
+{
+ struct clock_event_device *evt = this_cpu_ptr(arch_timer_evt);
+ arch_timer_ack(ARCH_TIMER_VIRT_ACCESS, evt);
+}
+#endif /* CONFIG_IPIPE */
+
+static inline irqreturn_t timer_handler(int irq, const int access,
+ struct clock_event_device *evt)
+{
+ if (clockevent_ipipe_stolen(evt))
+ goto stolen;
+
+ if (arch_timer_ack(access, evt)) {
+#ifdef CONFIG_IPIPE
+ struct ipipe_timer *itimer = raw_cpu_ptr(&arch_itimer);
+ if (itimer->irq != irq)
+ itimer->irq = irq;
+#endif /* CONFIG_IPIPE */
+ stolen:
+ /*
+ * This is a 64bit clock source, no need for TSC
+ * update.
+ */
evt->event_handler(evt);
return IRQ_HANDLED;
}
@@ -651,28 +698,28 @@ static irqreturn_t arch_timer_handler_virt(int irq, void *dev_id)
{
struct clock_event_device *evt = dev_id;
- return timer_handler(ARCH_TIMER_VIRT_ACCESS, evt);
+ return timer_handler(irq, ARCH_TIMER_VIRT_ACCESS, evt);
}
static irqreturn_t arch_timer_handler_phys(int irq, void *dev_id)
{
struct clock_event_device *evt = dev_id;
- return timer_handler(ARCH_TIMER_PHYS_ACCESS, evt);
+ return timer_handler(irq, ARCH_TIMER_PHYS_ACCESS, evt);
}
static irqreturn_t arch_timer_handler_phys_mem(int irq, void *dev_id)
{
struct clock_event_device *evt = dev_id;
- return timer_handler(ARCH_TIMER_MEM_PHYS_ACCESS, evt);
+ return timer_handler(irq, ARCH_TIMER_MEM_PHYS_ACCESS, evt);
}
static irqreturn_t arch_timer_handler_virt_mem(int irq, void *dev_id)
{
struct clock_event_device *evt = dev_id;
- return timer_handler(ARCH_TIMER_MEM_VIRT_ACCESS, evt);
+ return timer_handler(irq, ARCH_TIMER_MEM_VIRT_ACCESS, evt);
}
static __always_inline int timer_shutdown(const int access,
@@ -756,6 +803,18 @@ static void __arch_timer_setup(unsigned type,
arch_timer_check_ool_workaround(ate_match_local_cap_id, NULL);
+#ifdef CONFIG_IPIPE
+ clk->ipipe_timer = raw_cpu_ptr(&arch_itimer);
+ if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
+ clk->ipipe_timer->irq = arch_timer_ppi[ARCH_TIMER_VIRT_PPI];
+ clk->ipipe_timer->ack = arch_itimer_ack_virt;
+ } else {
+ clk->ipipe_timer->irq = arch_timer_ppi[ARCH_TIMER_PHYS_SECURE_PPI];
+ clk->ipipe_timer->ack = arch_itimer_ack_phys;
+ }
+ clk->ipipe_timer->freq = arch_timer_rate;
+#endif
+
if (arch_timer_c3stop)
clk->features |= CLOCK_EVT_FEAT_C3STOP;
clk->name = "arch_sys_timer";
@@ -860,6 +919,9 @@ static void arch_counter_set_user_access(void)
else
cntkctl |= ARCH_TIMER_USR_VCT_ACCESS_EN;
+#ifdef CONFIG_IPIPE
+ cntkctl |= ARCH_TIMER_USR_PCT_ACCESS_EN;
+#endif
arch_timer_set_cntkctl(cntkctl);
}
@@ -1004,6 +1066,10 @@ static void __init arch_counter_register(unsigned type)
arch_timer_read_counter = arch_counter_get_cntvct_mem;
}
+#ifdef CONFIG_IPIPE
+ tsc_info.freq = arch_timer_rate;
+ __ipipe_tsc_register(&tsc_info);
+#endif /* CONFIG_IPIPE */
if (!arch_counter_suspend_stop)
clocksource_counter.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
start_count = arch_timer_read_counter();
diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c
index 88b2d38a7a61..cc0a8bc7f7e5 100644
--- a/drivers/clocksource/arm_global_timer.c
+++ b/drivers/clocksource/arm_global_timer.c
@@ -20,6 +20,7 @@
#include <linux/of_irq.h>
#include <linux/of_address.h>
#include <linux/sched_clock.h>
+#include <linux/ipipe_tickdev.h>
#include <asm/cputype.h>
@@ -46,10 +47,69 @@
* the units for all operations.
*/
static void __iomem *gt_base;
+static unsigned long gt_pbase;
+static struct clk *gt_clk;
static unsigned long gt_clk_rate;
static int gt_ppi;
static struct clock_event_device __percpu *gt_evt;
+#ifdef CONFIG_IPIPE
+
+static struct clocksource gt_clocksource;
+
+static int gt_clockevent_ack(struct clock_event_device *evt);
+
+static DEFINE_PER_CPU(struct ipipe_timer, gt_itimer);
+
+static unsigned int refresh_gt_freq(void)
+{
+ gt_clk_rate = clk_get_rate(gt_clk);
+
+ __clocksource_update_freq_hz(>_clocksource, gt_clk_rate);
+
+ return gt_clk_rate;
+}
+
+static inline void gt_ipipe_cs_setup(void)
+{
+ struct __ipipe_tscinfo tsc_info = {
+ .type = IPIPE_TSC_TYPE_FREERUNNING,
+ .freq = gt_clk_rate,
+ .counter_vaddr = (unsigned long)gt_base,
+ .u = {
+ {
+ .counter_paddr = gt_pbase,
+ .mask = 0xffffffff,
+ }
+ },
+ .refresh_freq = refresh_gt_freq,
+ };
+
+ __ipipe_tsc_register(&tsc_info);
+}
+
+static void gt_itimer_ack(void)
+{
+ struct clock_event_device *evt = this_cpu_ptr(gt_evt);
+ gt_clockevent_ack(evt);
+}
+
+static inline void gt_ipipe_evt_setup(struct clock_event_device *evt)
+{
+ evt->ipipe_timer = this_cpu_ptr(>_itimer);
+ evt->ipipe_timer->irq = evt->irq;
+ evt->ipipe_timer->ack = gt_itimer_ack;
+ evt->ipipe_timer->freq = gt_clk_rate;
+}
+
+#else
+
+static inline void gt_ipipe_cs_setup(void) { }
+
+static inline void gt_ipipe_evt_setup(struct clock_event_device *evt) { }
+
+#endif /* CONFIG_IPIPE */
+
/*
* To get the value from the Global Timer Counter register proceed as follows:
* 1. Read the upper 32-bit timer counter register
@@ -134,13 +194,11 @@ static int gt_clockevent_set_next_event(unsigned long evt,
return 0;
}
-static irqreturn_t gt_clockevent_interrupt(int irq, void *dev_id)
+static int gt_clockevent_ack(struct clock_event_device *evt)
{
- struct clock_event_device *evt = dev_id;
-
if (!(readl_relaxed(gt_base + GT_INT_STATUS) &
GT_INT_STATUS_EVENT_FLAG))
- return IRQ_NONE;
+ return IS_ENABLED(CONFIG_IPIPE);
/**
* ERRATA 740657( Global Timer can send 2 interrupts for
@@ -153,10 +211,23 @@ static irqreturn_t gt_clockevent_interrupt(int irq, void *dev_id)
* the Global Timer flag _after_ having incremented
* the Comparator register value to a higher value.
*/
- if (clockevent_state_oneshot(evt))
+ if (clockevent_ipipe_stolen(evt) || clockevent_state_oneshot(evt))
gt_compare_set(ULONG_MAX, 0);
writel_relaxed(GT_INT_STATUS_EVENT_FLAG, gt_base + GT_INT_STATUS);
+
+ return 1;
+}
+
+static irqreturn_t gt_clockevent_interrupt(int irq, void *dev_id)
+{
+ struct clock_event_device *evt = dev_id;
+
+ if (!clockevent_ipipe_stolen(evt)) {
+ if (!gt_clockevent_ack(evt))
+ return IRQ_NONE;
+ }
+
evt->event_handler(evt);
return IRQ_HANDLED;
@@ -177,6 +248,7 @@ static int gt_starting_cpu(unsigned int cpu)
clk->cpumask = cpumask_of(cpu);
clk->rating = 300;
clk->irq = gt_ppi;
+ gt_ipipe_evt_setup(clk);
clockevents_config_and_register(clk, gt_clk_rate,
1, 0xffffffff);
enable_percpu_irq(clk->irq, IRQ_TYPE_NONE);
@@ -249,13 +321,14 @@ static int __init gt_clocksource_init(void)
#ifdef CONFIG_CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK
sched_clock_register(gt_sched_clock_read, 64, gt_clk_rate);
#endif
+ gt_ipipe_cs_setup();
return clocksource_register_hz(>_clocksource, gt_clk_rate);
}
static int __init global_timer_of_register(struct device_node *np)
{
- struct clk *gt_clk;
int err = 0;
+ struct resource res;
/*
* In A9 r2p0 the comparators for each processor with the global timer
@@ -280,6 +353,11 @@ static int __init global_timer_of_register(struct device_node *np)
return -ENXIO;
}
+ if (of_address_to_resource(np, 0, &res))
+ res.start = 0;
+
+ gt_pbase = res.start;
+
gt_clk = of_clk_get(np, 0);
if (!IS_ERR(gt_clk)) {
err = clk_prepare_enable(gt_clk);
diff --git a/drivers/clocksource/bcm2835_timer.c b/drivers/clocksource/bcm2835_timer.c
index b235f446ee50..24932b779ddd 100644
--- a/drivers/clocksource/bcm2835_timer.c
+++ b/drivers/clocksource/bcm2835_timer.c
@@ -16,6 +16,9 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/sched_clock.h>
+#include <linux/ipipe.h>
+#include <linux/ipipe_tickdev.h>
+#include <linux/time.h>
#include <asm/irq.h>
@@ -26,6 +29,7 @@
#define MAX_TIMER 3
#define DEFAULT_TIMER 3
+
struct bcm2835_timer {
void __iomem *control;
void __iomem *compare;
@@ -33,9 +37,53 @@ struct bcm2835_timer {
struct clock_event_device evt;
struct irqaction act;
};
-
static void __iomem *system_clock __read_mostly;
+#ifdef CONFIG_IPIPE
+
+static void __iomem *t_base;
+static unsigned long t_pbase;
+
+static inline void bcm2835_ipipe_cs_setup(unsigned int freq)
+{
+ struct __ipipe_tscinfo tsc_info = {
+ .type = IPIPE_TSC_TYPE_FREERUNNING,
+ .freq = freq,
+ .counter_vaddr = (unsigned long)t_base + 0x04,
+ .u = {
+ {
+ .counter_paddr = t_pbase + 0x04,
+ .mask = 0xffffffff,
+ }
+ },
+ };
+
+ __ipipe_tsc_register(&tsc_info);
+}
+
+static struct ipipe_timer bcm2835_itimer;
+
+static void bcm2835_itimer_ack(void)
+{
+ struct bcm2835_timer *timer = container_of(bcm2835_itimer.host_timer,
+ struct bcm2835_timer, evt);
+ writel(timer->match_mask, timer->control);
+}
+
+static inline void bcm2835_ipipe_evt_setup(struct clock_event_device *evt,
+ int freq)
+{
+ evt->ipipe_timer = &bcm2835_itimer;
+ evt->ipipe_timer->irq = evt->irq;
+ evt->ipipe_timer->ack = bcm2835_itimer_ack;
+ evt->ipipe_timer->freq = freq;
+}
+
+#else
+static inline void bcm2835_ipipe_cs_setup(void) { }
+static inline void bcm2835_ipipe_evt_setup(struct clock_event_device *evt) { }
+#endif /* CONFIG_IPIPE */
+
static u64 notrace bcm2835_sched_read(void)
{
return readl_relaxed(system_clock);
@@ -46,8 +94,7 @@ static int bcm2835_time_set_next_event(unsigned long event,
{
struct bcm2835_timer *timer = container_of(evt_dev,
struct bcm2835_timer, evt);
- writel_relaxed(readl_relaxed(system_clock) + event,
- timer->compare);
+ writel_relaxed(readl_relaxed(system_clock) + event, timer->compare);
return 0;
}
@@ -55,9 +102,13 @@ static irqreturn_t bcm2835_time_interrupt(int irq, void *dev_id)
{
struct bcm2835_timer *timer = dev_id;
void (*event_handler)(struct clock_event_device *);
+
+ if (clockevent_ipipe_stolen(&timer->evt)) {
+ goto handle;
+ }
if (readl_relaxed(timer->control) & timer->match_mask) {
writel_relaxed(timer->match_mask, timer->control);
-
+ handle:
event_handler = READ_ONCE(timer->evt.event_handler);
if (event_handler)
event_handler(&timer->evt);
@@ -80,6 +131,18 @@ static int __init bcm2835_timer_init(struct device_node *node)
return -ENXIO;
}
+ if (IS_ENABLED(CONFIG_IPIPE)) {
+ struct resource res;
+ int ret;
+
+ ret = of_address_to_resource(node, 0, &res);
+ if (ret)
+ res.start = 0;
+
+ t_base = base;
+ t_pbase = res.start;
+ }
+
ret = of_property_read_u32(node, "clock-frequency", &freq);
if (ret) {
pr_err("Can't read clock-frequency\n");
@@ -114,10 +177,21 @@ static int __init bcm2835_timer_init(struct device_node *node)
timer->evt.set_next_event = bcm2835_time_set_next_event;
timer->evt.cpumask = cpumask_of(0);
timer->act.name = node->name;
- timer->act.flags = IRQF_TIMER | IRQF_SHARED;
+ timer->act.flags = IRQF_TIMER;
timer->act.dev_id = timer;
timer->act.handler = bcm2835_time_interrupt;
+ if (IS_ENABLED(CONFIG_IPIPE)) {
+ bcm2835_ipipe_cs_setup(freq);
+ bcm2835_ipipe_evt_setup(&timer->evt, freq);
+ timer->evt.ipipe_timer = &bcm2835_itimer;
+ timer->evt.ipipe_timer->irq = irq;
+ timer->evt.ipipe_timer->ack = bcm2835_itimer_ack;
+ timer->evt.ipipe_timer->freq = freq;
+ } else {
+ timer->act.flags |= IRQF_SHARED;
+ }
+
ret = setup_irq(irq, &timer->act);
if (ret) {
pr_err("Can't set up timer IRQ\n");
diff --git a/drivers/clocksource/dw_apb_timer.c b/drivers/clocksource/dw_apb_timer.c
index 10ce69548f1b..f1e389689296 100644
--- a/drivers/clocksource/dw_apb_timer.c
+++ b/drivers/clocksource/dw_apb_timer.c
@@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include <linux/interrupt.h>
#include <linux/irq.h>
+#include <linux/ipipe.h>
#include <linux/io.h>
#include <linux/slab.h>
@@ -382,7 +383,7 @@ static void apbt_restart_clocksource(struct clocksource *cs)
*/
struct dw_apb_clocksource *
dw_apb_clocksource_init(unsigned rating, const char *name, void __iomem *base,
- unsigned long freq)
+ unsigned long phys, unsigned long freq)
{
struct dw_apb_clocksource *dw_cs = kzalloc(sizeof(*dw_cs), GFP_KERNEL);
@@ -397,10 +398,22 @@ dw_apb_clocksource_init(unsigned rating, const char *name, void __iomem *base,
dw_cs->cs.mask = CLOCKSOURCE_MASK(32);
dw_cs->cs.flags = CLOCK_SOURCE_IS_CONTINUOUS;
dw_cs->cs.resume = apbt_restart_clocksource;
+ dw_cs->phys = phys;
return dw_cs;
}
+#ifdef CONFIG_IPIPE
+static struct __ipipe_tscinfo apb_tsc_info = {
+ .type = IPIPE_TSC_TYPE_FREERUNNING_COUNTDOWN,
+ .u = {
+ .dec = {
+ .mask = 0xffffffffU,
+ },
+ },
+};
+#endif
+
/**
* dw_apb_clocksource_register() - register the APB clocksource.
*
@@ -409,6 +422,12 @@ dw_apb_clocksource_init(unsigned rating, const char *name, void __iomem *base,
void dw_apb_clocksource_register(struct dw_apb_clocksource *dw_cs)
{
clocksource_register_hz(&dw_cs->cs, dw_cs->timer.freq);
+#ifdef CONFIG_IPIPE
+ apb_tsc_info.u.dec.counter = (void *)(dw_cs->phys + APBTMR_N_CURRENT_VALUE);
+ apb_tsc_info.counter_vaddr = (unsigned long)dw_cs->timer.base + APBTMR_N_CURRENT_VALUE;
+ apb_tsc_info.freq = dw_cs->timer.freq;
+ __ipipe_tsc_register(&apb_tsc_info);
+#endif
}
/**
diff --git a/drivers/clocksource/dw_apb_timer_of.c b/drivers/clocksource/dw_apb_timer_of.c
index 6921b91b61ef..e1d2dd5b83cf 100644
--- a/drivers/clocksource/dw_apb_timer_of.c
+++ b/drivers/clocksource/dw_apb_timer_of.c
@@ -15,17 +15,21 @@
#include <linux/sched_clock.h>
static void __init timer_get_base_and_rate(struct device_node *np,
- void __iomem **base, u32 *rate)
+ void __iomem **base, unsigned long *phys,
+ u32 *rate)
{
struct clk *timer_clk;
+ struct resource res;
struct clk *pclk;
struct reset_control *rstc;
*base = of_iomap(np, 0);
- if (!*base)
+ if (!*base || of_address_to_resource(np, 0, &res))
panic("Unable to map regs for %pOFn", np);
+ *phys = res.start;
+
/*
* Reset the timer if the reset control is available, wiping
* out the state the firmware may have left it
@@ -65,13 +69,14 @@ static void __init add_clockevent(struct device_node *event_timer)
{
void __iomem *iobase;
struct dw_apb_clock_event_device *ced;
+ unsigned long phys;
u32 irq, rate;
irq = irq_of_parse_and_map(event_timer, 0);
if (irq == 0)
panic("No IRQ for clock event timer");
- timer_get_base_and_rate(event_timer, &iobase, &rate);
+ timer_get_base_and_rate(event_timer, &iobase, &phys, &rate);
ced = dw_apb_clockevent_init(0, event_timer->name, 300, iobase, irq,
rate);
@@ -88,11 +93,12 @@ static void __init add_clocksource(struct device_node *source_timer)
{
void __iomem *iobase;
struct dw_apb_clocksource *cs;
+ unsigned long phys;
u32 rate;
- timer_get_base_and_rate(source_timer, &iobase, &rate);
+ timer_get_base_and_rate(source_timer, &iobase, &phys, &rate);
- cs = dw_apb_clocksource_init(300, source_timer->name, iobase, rate);
+ cs = dw_apb_clocksource_init(300, source_timer->name, iobase, phys, rate);
if (!cs)
panic("Unable to initialise clocksource device");
@@ -121,11 +127,12 @@ static const struct of_device_id sptimer_ids[] __initconst = {
static void __init init_sched_clock(void)
{
struct device_node *sched_timer;
+ unsigned long phys;
sched_timer = of_find_matching_node(NULL, sptimer_ids);
if (sched_timer) {
timer_get_base_and_rate(sched_timer, &sched_io_base,
- &sched_rate);
+ &phys, &sched_rate);
of_node_put(sched_timer);
}
diff --git a/drivers/clocksource/timer-imx-gpt.c b/drivers/clocksource/timer-imx-gpt.c
index 706c0d0ff56c..1db8f5b8da93 100644
--- a/drivers/clocksource/timer-imx-gpt.c
+++ b/drivers/clocksource/timer-imx-gpt.c
@@ -16,6 +16,8 @@
#include <linux/of.h>
#include <linux/of_address.h>
#include <linux/of_irq.h>
+#include <linux/ipipe.h>
+#include <linux/ipipe_tickdev.h>
#include <soc/imx/timer.h>
/*
@@ -61,6 +63,9 @@
struct imx_timer {
enum imx_gpt_type type;
+#ifdef CONFIG_IPIPE
+ unsigned long pbase;
+#endif
void __iomem *base;
int irq;
struct clk *clk_per;
@@ -252,6 +257,30 @@ static int mxc_set_oneshot(struct clock_event_device *ced)
return 0;
}
+#ifdef CONFIG_IPIPE
+
+static struct imx_timer *global_imx_timer;
+
+static void mxc_timer_ack(void)
+{
+ global_imx_timer->gpt->gpt_irq_acknowledge(global_imx_timer);
+}
+
+static struct __ipipe_tscinfo tsc_info = {
+ .type = IPIPE_TSC_TYPE_FREERUNNING,
+ .u = {
+ {
+ .mask = 0xffffffff,
+ },
+ },
+};
+
+static struct ipipe_timer mxc_itimer = {
+ .ack = mxc_timer_ack,
+};
+
+#endif
+
/*
* IRQ handler for the timer
*/
@@ -263,7 +292,8 @@ static irqreturn_t mxc_timer_interrupt(int irq, void *dev_id)
tstat = readl_relaxed(imxtm->base + imxtm->gpt->reg_tstat);
- imxtm->gpt->gpt_irq_acknowledge(imxtm);
+ if (!clockevent_ipipe_stolen(ced))
+ imxtm->gpt->gpt_irq_acknowledge(imxtm);
ced->event_handler(ced);
@@ -284,6 +314,9 @@ static int __init mxc_clockevent_init(struct imx_timer *imxtm)
ced->rating = 200;
ced->cpumask = cpumask_of(0);
ced->irq = imxtm->irq;
+#ifdef CONFIG_IPIPE
+ ced->ipipe_timer = &mxc_itimer;
+#endif
clockevents_config_and_register(ced, clk_get_rate(imxtm->clk_per),
0xff, 0xfffffffe);
@@ -423,6 +456,17 @@ static int __init _mxc_timer_init(struct imx_timer *imxtm)
if (ret)
return ret;
+#ifdef CONFIG_IPIPE
+ tsc_info.u.counter_paddr = imxtm->pbase + imxtm->gpt->reg_tcn;
+ tsc_info.counter_vaddr = (unsigned long)imxtm->base + imxtm->gpt->reg_tcn;
+ tsc_info.freq = clk_get_rate(imxtm->clk_per);
+ __ipipe_tsc_register(&tsc_info);
+ mxc_itimer.irq = imxtm->irq;
+ mxc_itimer.freq = clk_get_rate(imxtm->clk_per);
+ mxc_itimer.min_delay_ticks = ipipe_timer_ns2ticks(&mxc_itimer, 2000);
+ global_imx_timer = imxtm;
+#endif /* CONFIG_IPIPE */
+
return mxc_clockevent_init(imxtm);
}
@@ -438,6 +482,9 @@ void __init mxc_timer_init(unsigned long pbase, int irq, enum imx_gpt_type type)
imxtm->base = ioremap(pbase, SZ_4K);
BUG_ON(!imxtm->base);
+#ifdef CONFIG_IPIPE
+ imxtm->pbase = pbase;
+#endif
imxtm->type = type;
imxtm->irq = irq;
@@ -449,6 +496,7 @@ static int __init mxc_timer_init_dt(struct device_node *np, enum imx_gpt_type t
{
struct imx_timer *imxtm;
static int initialized;
+ struct resource res;
int ret;
/* Support one instance only */
@@ -467,6 +515,13 @@ static int __init mxc_timer_init_dt(struct device_node *np, enum imx_gpt_type t
if (imxtm->irq <= 0)
return -EINVAL;
+ if (of_address_to_resource(np, 0, &res))
+ res.start = 0;
+
+#ifdef CONFIG_IPIPE
+ imxtm->pbase = res.start;
+#endif
+
imxtm->clk_ipg = of_clk_get_by_name(np, "ipg");
/* Try osc_per first, and fall back to per otherwise */
diff --git a/drivers/clocksource/timer-sp804.c b/drivers/clocksource/timer-sp804.c
index c9aa0498fb84..e4996c2afaa1 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -17,11 +17,25 @@
#include <linux/of_clk.h>
#include <linux/of_irq.h>
#include <linux/sched_clock.h>
+#include <linux/module.h>
+#include <linux/ipipe.h>
+#include <linux/ipipe_tickdev.h>
#include <clocksource/timer-sp804.h>
#include "timer-sp.h"
+#ifdef CONFIG_IPIPE
+static struct __ipipe_tscinfo tsc_info = {
+ .type = IPIPE_TSC_TYPE_FREERUNNING_COUNTDOWN,
+ .u = {
+ {
+ .mask = 0xffffffff,
+ },
+ },
+};
+#endif /* CONFIG_IPIPE */
+
static long __init sp804_get_clock_rate(struct clk *clk)
{
long rate;
@@ -66,6 +80,7 @@ void __init sp804_timer_disable(void __iomem *base)
}
int __init __sp804_clocksource_and_sched_clock_init(void __iomem *base,
+ unsigned long phys,
const char *name,
struct clk *clk,
int use_sched_clock)
@@ -100,6 +115,12 @@ int __init __sp804_clocksource_and_sched_clock_init(void __iomem *base,
sched_clock_register(sp804_read, 32, rate);
}
+#ifdef CONFIG_IPIPE
+ tsc_info.freq = rate;
+ tsc_info.counter_vaddr = (unsigned long)base + TIMER_VALUE;
+ tsc_info.u.counter_paddr = phys + TIMER_VALUE;
+ __ipipe_tsc_register(&tsc_info);
+#endif
return 0;
}
@@ -214,6 +235,7 @@ static int __init sp804_of_init(struct device_node *np)
u32 irq_num = 0;
struct clk *clk1, *clk2;
const char *name = of_get_property(np, "compatible", NULL);
+ struct resource res;
if (initialized) {
pr_debug("%pOF: skipping further SP804 timer device\n", np);
@@ -247,6 +269,9 @@ static int __init sp804_of_init(struct device_node *np)
if (irq <= 0)
goto err;
+ if (of_address_to_resource(np, 0, &res))
+ res.start = 0;
+
of_property_read_u32(np, "arm,sp804-has-irq", &irq_num);
if (irq_num == 2) {
@@ -254,7 +279,7 @@ static int __init sp804_of_init(struct device_node *np)
if (ret)
goto err;
- ret = __sp804_clocksource_and_sched_clock_init(base, name, clk1, 1);
+ ret = __sp804_clocksource_and_sched_clock_init(base, res.start, name, clk1, 1);
if (ret)
goto err;
} else {
@@ -264,7 +289,7 @@ static int __init sp804_of_init(struct device_node *np)
goto err;
ret =__sp804_clocksource_and_sched_clock_init(base + TIMER_2_BASE,
- name, clk2, 1);
+ res.start, name, clk2, 1);
if (ret)
goto err;
}
@@ -284,6 +309,7 @@ static int __init integrator_cp_of_init(struct device_node *np)
int irq, ret = -EINVAL;
const char *name = of_get_property(np, "compatible", NULL);
struct clk *clk;
+ struct resource res;
base = of_iomap(np, 0);
if (!base) {
@@ -303,8 +329,11 @@ static int __init integrator_cp_of_init(struct device_node *np)
if (init_count == 2 || !of_device_is_available(np))
goto err;
+ if (of_address_to_resource(np, 0, &res))
+ res.start = 0;
+
if (!init_count) {
- ret = __sp804_clocksource_and_sched_clock_init(base, name, clk, 0);
+ ret = __sp804_clocksource_and_sched_clock_init(base, res.start, name, clk, 0);
if (ret)
goto err;
} else {
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 73f08cda21e0..d771356ec79d 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -17,6 +17,7 @@
#include <linux/pm_qos.h>
#include <linux/cpu.h>
#include <linux/cpuidle.h>
+#include <linux/ipipe.h>
#include <linux/ktime.h>
#include <linux/hrtimer.h>
#include <linux/module.h>
@@ -205,6 +206,19 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
ktime_t time_start, time_end;
/*
+ * A co-kernel running on the head stage of the IRQ pipeline
+ * may deny switching to a deeper C-state. If so, call the
+ * default idle routine instead. If the co-kernel cannot bear
+ * with the latency induced by the default idling operation,
+ * then CPUIDLE is not usable and should be disabled at build
+ * time.
+ */
+ if (!ipipe_enter_cpuidle(dev, target_state)) {
+ default_idle_call();
+ return -EBUSY;
+ }
+
+ /*
* Tell the time framework to switch to a broadcast timer because our
* local timer will be shut down. If a local timer is used from another
* CPU as a broadcast timer, this call may fail if it is not available.
@@ -228,6 +242,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
stop_critical_timings();
entered_state = target_state->enter(dev, drv, index);
+ hard_cond_local_irq_enable();
start_critical_timings();
sched_clock_idle_wakeup_event();
diff --git a/drivers/gpio/gpio-davinci.c b/drivers/gpio/gpio-davinci.c
index e0b025689625..358254edbde6 100644
--- a/drivers/gpio/gpio-davinci.c
+++ b/drivers/gpio/gpio-davinci.c
@@ -22,7 +22,7 @@
#include <linux/platform_data/gpio-davinci.h>
#include <linux/irqchip/chained_irq.h>
#include <linux/spinlock.h>
-
+#include <linux/ipipe.h>
#include <asm-generic/gpio.h>
#define MAX_REGS_BANKS 5
@@ -333,7 +333,7 @@ static struct irq_chip gpio_irqchip = {
.irq_enable = gpio_irq_enable,
.irq_disable = gpio_irq_disable,
.irq_set_type = gpio_irq_type,
- .flags = IRQCHIP_SET_TYPE_MASKED,
+ .flags = IRQCHIP_SET_TYPE_MASKED | IRQCHIP_PIPELINE_SAFE,
};
static void gpio_irq_handler(struct irq_desc *desc)
@@ -376,7 +376,7 @@ static void gpio_irq_handler(struct irq_desc *desc)
*/
hw_irq = (bank_num / 2) * 32 + bit;
- generic_handle_irq(
+ ipipe_handle_demuxed_irq(
irq_find_mapping(d->irq_domain, hw_irq));
}
}
diff --git a/drivers/gpio/gpio-mvebu.c b/drivers/gpio/gpio-mvebu.c
index b5ae28fce9a8..8de9ffd87acf 100644
--- a/drivers/gpio/gpio-mvebu.c
+++ b/drivers/gpio/gpio-mvebu.c
@@ -52,6 +52,7 @@
#include <linux/pwm.h>
#include <linux/regmap.h>
#include <linux/slab.h>
+#include <linux/ipipe.h>
/*
* GPIO unit register offsets.
@@ -402,10 +403,11 @@ static void mvebu_gpio_irq_ack(struct irq_data *d)
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct mvebu_gpio_chip *mvchip = gc->private;
u32 mask = d->mask;
+ unsigned long flags;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
mvebu_gpio_write_edge_cause(mvchip, ~mask);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
static void mvebu_gpio_edge_irq_mask(struct irq_data *d)
@@ -414,11 +416,12 @@ static void mvebu_gpio_edge_irq_mask(struct irq_data *d)
struct mvebu_gpio_chip *mvchip = gc->private;
struct irq_chip_type *ct = irq_data_get_chip_type(d);
u32 mask = d->mask;
+ unsigned long flags;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
ct->mask_cache_priv &= ~mask;
mvebu_gpio_write_edge_mask(mvchip, ct->mask_cache_priv);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
static void mvebu_gpio_edge_irq_unmask(struct irq_data *d)
@@ -427,11 +430,12 @@ static void mvebu_gpio_edge_irq_unmask(struct irq_data *d)
struct mvebu_gpio_chip *mvchip = gc->private;
struct irq_chip_type *ct = irq_data_get_chip_type(d);
u32 mask = d->mask;
+ unsigned long flags;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
ct->mask_cache_priv |= mask;
mvebu_gpio_write_edge_mask(mvchip, ct->mask_cache_priv);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
static void mvebu_gpio_level_irq_mask(struct irq_data *d)
@@ -440,11 +444,12 @@ static void mvebu_gpio_level_irq_mask(struct irq_data *d)
struct mvebu_gpio_chip *mvchip = gc->private;
struct irq_chip_type *ct = irq_data_get_chip_type(d);
u32 mask = d->mask;
+ unsigned long flags;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
ct->mask_cache_priv &= ~mask;
mvebu_gpio_write_level_mask(mvchip, ct->mask_cache_priv);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
static void mvebu_gpio_level_irq_unmask(struct irq_data *d)
@@ -453,11 +458,12 @@ static void mvebu_gpio_level_irq_unmask(struct irq_data *d)
struct mvebu_gpio_chip *mvchip = gc->private;
struct irq_chip_type *ct = irq_data_get_chip_type(d);
u32 mask = d->mask;
+ unsigned long flags;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
ct->mask_cache_priv |= mask;
mvebu_gpio_write_level_mask(mvchip, ct->mask_cache_priv);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
/*****************************************************************************
@@ -591,7 +597,7 @@ static void mvebu_gpio_irq_handler(struct irq_desc *desc)
polarity);
}
- generic_handle_irq(irq);
+ ipipe_handle_demuxed_irq(irq);
}
chained_irq_exit(chip, desc);
@@ -1235,6 +1241,7 @@ static int mvebu_gpio_probe(struct platform_device *pdev)
ct->chip.irq_unmask = mvebu_gpio_level_irq_unmask;
ct->chip.irq_set_type = mvebu_gpio_irq_set_type;
ct->chip.name = mvchip->chip.label;
+ ct->chip.flags = IRQCHIP_PIPELINE_SAFE;
ct = &gc->chip_types[1];
ct->type = IRQ_TYPE_EDGE_RISING | IRQ_TYPE_EDGE_FALLING;
@@ -1244,6 +1251,7 @@ static int mvebu_gpio_probe(struct platform_device *pdev)
ct->chip.irq_set_type = mvebu_gpio_irq_set_type;
ct->handler = handle_edge_irq;
ct->chip.name = mvchip->chip.label;
+ ct->chip.flags = IRQCHIP_PIPELINE_SAFE;
/*
* Setup the interrupt handlers. Each chip can have up to 4
diff --git a/drivers/gpio/gpio-mxc.c b/drivers/gpio/gpio-mxc.c
index 2e4b6b176875..c23117fe0f7e 100644
--- a/drivers/gpio/gpio-mxc.c
+++ b/drivers/gpio/gpio-mxc.c
@@ -22,6 +22,7 @@
#include <linux/of.h>
#include <linux/of_device.h>
#include <linux/bug.h>
+#include <linux/ipipe.h>
enum mxc_gpio_hwtype {
IMX1_GPIO, /* runs on i.mx1 */
@@ -266,7 +267,7 @@ static void mxc_gpio_irq_handler(struct mxc_gpio_port *port, u32 irq_stat)
if (port->both_edges & (1 << irqoffset))
mxc_flip_edge(port, irqoffset);
- generic_handle_irq(irq_find_mapping(port->domain, irqoffset));
+ ipipe_handle_demuxed_irq(irq_find_mapping(port->domain, irqoffset));
irq_stat &= ~(1 << irqoffset);
}
@@ -359,7 +360,7 @@ static int mxc_gpio_init_gc(struct mxc_gpio_port *port, int irq_base)
ct->chip.irq_unmask = irq_gc_mask_set_bit;
ct->chip.irq_set_type = gpio_set_irq_type;
ct->chip.irq_set_wake = gpio_set_wake_irq;
- ct->chip.flags = IRQCHIP_MASK_ON_SUSPEND;
+ ct->chip.flags = IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE;
ct->regs.ack = GPIO_ISR;
ct->regs.mask = GPIO_IMR;
diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index ce6954390cfd..d430b64c405f 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -24,6 +24,7 @@
#include <linux/of_device.h>
#include <linux/gpio/driver.h>
#include <linux/bitops.h>
+#include <linux/ipipe.h>
#include <linux/platform_data/gpio-omap.h>
#define OMAP4_GPIO_DEBOUNCINGTIME_MASK 0xFF
@@ -55,7 +56,11 @@ struct gpio_bank {
u32 saved_datain;
u32 level_mask;
u32 toggle_mask;
+#ifdef CONFIG_IPIPE
+ ipipe_spinlock_t lock;
+#else
raw_spinlock_t lock;
+#endif
raw_spinlock_t wa_lock;
struct gpio_chip chip;
struct clk *dbck;
@@ -554,18 +559,18 @@ static int omap_gpio_wake_enable(struct irq_data *d, unsigned int enable)
* line's interrupt handler has been run, we may miss some nested
* interrupts.
*/
-static irqreturn_t omap_gpio_irq_handler(int irq, void *gpiobank)
+static void __omap_gpio_irq_handler(struct gpio_bank *bank)
{
void __iomem *isr_reg = NULL;
u32 enabled, isr, edge;
unsigned int bit;
- struct gpio_bank *bank = gpiobank;
unsigned long wa_lock_flags;
unsigned long lock_flags;
isr_reg = bank->base + bank->regs->irqstatus;
if (WARN_ON(!isr_reg))
- goto exit;
+ return;
+
if (WARN_ONCE(!pm_runtime_active(bank->chip.parent),
"gpio irq%i while runtime suspended?\n", irq))
@@ -610,17 +615,38 @@ static irqreturn_t omap_gpio_irq_handler(int irq, void *gpiobank)
raw_spin_lock_irqsave(&bank->wa_lock, wa_lock_flags);
- generic_handle_irq(irq_find_mapping(bank->chip.irq.domain,
- bit));
+ ipipe_handle_demuxed_irq(irq_find_mapping(bank->chip.irq.domain,
+ bit));
raw_spin_unlock_irqrestore(&bank->wa_lock,
wa_lock_flags);
}
}
-exit:
+}
+
+#ifdef CONFIG_IPIPE
+
+static void omap_gpio_irq_handler(struct irq_desc *d)
+{
+ struct gpio_bank *bank = irq_desc_get_handler_data(d);
+ __omap_gpio_irq_handler(bank);
+}
+
+#else
+
+static irqreturn_t omap_gpio_irq_handler(int irq, void *gpiobank)
+{
+ struct gpio_bank *bank = gpiobank;
+
+ pm_runtime_get_sync(bank->chip.parent);
+ __omap_gpio_irq_handler(bank);
+ pm_runtime_put(bank->chip.parent);
+
return IRQ_HANDLED;
}
+#endif
+
static unsigned int omap_gpio_irq_startup(struct irq_data *d)
{
struct gpio_bank *bank = omap_irq_data_get_bank(d);
@@ -683,6 +709,19 @@ static void omap_gpio_mask_irq(struct irq_data *d)
raw_spin_unlock_irqrestore(&bank->lock, flags);
}
+static void omap_gpio_mask_ack_irq(struct irq_data *d)
+{
+ struct gpio_bank *bank = omap_irq_data_get_bank(d);
+ unsigned offset = d->hwirq;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&bank->lock, flags);
+ omap_set_gpio_irqenable(bank, offset, 0);
+ omap_set_gpio_triggering(bank, offset, IRQ_TYPE_NONE);
+ omap_clear_gpio_irqstatus(bank, offset);
+ raw_spin_unlock_irqrestore(&bank->lock, flags);
+}
+
static void omap_gpio_unmask_irq(struct irq_data *d)
{
struct gpio_bank *bank = omap_irq_data_get_bank(d);
@@ -1042,11 +1081,16 @@ static int omap_gpio_chip_init(struct gpio_bank *bank, struct irq_chip *irqc)
return ret;
}
+#ifdef CONFIG_IPIPE
+ irq_set_chained_handler_and_data(bank->irq,
+ omap_gpio_irq_handler, bank);
+#else
ret = devm_request_irq(bank->chip.parent, bank->irq,
omap_gpio_irq_handler,
0, dev_name(bank->chip.parent), bank);
if (ret)
gpiochip_remove(&bank->chip);
+#endif
if (!bank->is_mpuio)
gpio += bank->width;
@@ -1377,13 +1421,14 @@ static int omap_gpio_probe(struct platform_device *pdev)
irqc->irq_shutdown = omap_gpio_irq_shutdown,
irqc->irq_ack = dummy_irq_chip.irq_ack,
irqc->irq_mask = omap_gpio_mask_irq,
+ irqc->irq_mask_ack = omap_gpio_mask_ack_irq,
irqc->irq_unmask = omap_gpio_unmask_irq,
irqc->irq_set_type = omap_gpio_irq_type,
irqc->irq_set_wake = omap_gpio_wake_enable,
irqc->irq_bus_lock = omap_gpio_irq_bus_lock,
irqc->irq_bus_sync_unlock = gpio_irq_bus_sync_unlock,
irqc->name = dev_name(&pdev->dev);
- irqc->flags = IRQCHIP_MASK_ON_SUSPEND;
+ irqc->flags = IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE;
irqc->parent_device = dev;
bank->irq = platform_get_irq(pdev, 0);
diff --git a/drivers/gpio/gpio-pl061.c b/drivers/gpio/gpio-pl061.c
index 722ce5cf861e..6d3150312f95 100644
--- a/drivers/gpio/gpio-pl061.c
+++ b/drivers/gpio/gpio-pl061.c
@@ -23,6 +23,7 @@
#include <linux/slab.h>
#include <linux/pinctrl/consumer.h>
#include <linux/pm.h>
+#include <linux/ipipe.h>
#define GPIODIR 0x400
#define GPIOIS 0x404
@@ -47,7 +48,11 @@ struct pl061_context_save_regs {
#endif
struct pl061 {
+#ifdef CONFIG_IPIPE
+ ipipe_spinlock_t lock;
+#else
raw_spinlock_t lock;
+#endif
void __iomem *base;
struct gpio_chip gc;
@@ -219,8 +224,8 @@ static void pl061_irq_handler(struct irq_desc *desc)
pending = readb(pl061->base + GPIOMIS);
if (pending) {
for_each_set_bit(offset, &pending, PL061_GPIO_NR)
- generic_handle_irq(irq_find_mapping(gc->irq.domain,
- offset));
+ ipipe_handle_demuxed_irq(irq_find_mapping(gc->irq.domain,
+ offset));
}
chained_irq_exit(irqchip, desc);
@@ -231,6 +236,22 @@ static void pl061_irq_mask(struct irq_data *d)
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
struct pl061 *pl061 = gpiochip_get_data(gc);
u8 mask = BIT(irqd_to_hwirq(d) % PL061_GPIO_NR);
+ unsigned long flags;
+ u8 gpioie;
+
+ raw_spin_lock_irqsave(&pl061->lock, flags);
+ gpioie = readb(pl061->base + GPIOIE) & ~mask;
+ writeb(gpioie, pl061->base + GPIOIE);
+ ipipe_lock_irq(d->irq);
+ raw_spin_unlock_irqrestore(&pl061->lock, flags);
+}
+
+#ifdef CONFIG_IPIPE
+static void pl061_irq_mask_ack(struct irq_data *d)
+{
+ struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+ struct pl061 *pl061 = gpiochip_get_data(gc);
+ u8 mask = BIT(irqd_to_hwirq(d) % PL061_GPIO_NR);
u8 gpioie;
raw_spin_lock(&pl061->lock);
@@ -238,6 +259,7 @@ static void pl061_irq_mask(struct irq_data *d)
writeb(gpioie, pl061->base + GPIOIE);
raw_spin_unlock(&pl061->lock);
}
+#endif
static void pl061_irq_unmask(struct irq_data *d)
{
@@ -320,6 +342,10 @@ static int pl061_probe(struct amba_device *adev, const struct amba_id *id)
pl061->irq_chip.irq_unmask = pl061_irq_unmask;
pl061->irq_chip.irq_set_type = pl061_irq_type;
pl061->irq_chip.irq_set_wake = pl061_irq_set_wake;
+#ifdef CONFIG_IPIPE
+ pl061->irq_chip.irq_mask_ack = pl061_irq_mask_ack;
+ pl061->irq_chip.flags = IRQCHIP_PIPELINE_SAFE;
+#endif
writeb(0, pl061->base + GPIOIE); /* disable irqs */
irq = adev->irq[0];
diff --git a/drivers/gpio/gpio-zynq.c b/drivers/gpio/gpio-zynq.c
index 25a42605aa81..faaa3dc8203a 100644
--- a/drivers/gpio/gpio-zynq.c
+++ b/drivers/gpio/gpio-zynq.c
@@ -10,6 +10,7 @@
#include <linux/gpio/driver.h>
#include <linux/init.h>
#include <linux/interrupt.h>
+#include <linux/ipipe.h>
#include <linux/io.h>
#include <linux/module.h>
#include <linux/platform_device.h>
@@ -126,6 +127,8 @@ struct zynq_gpio {
struct gpio_regs context;
};
+static IPIPE_DEFINE_RAW_SPINLOCK(zynq_gpio_lock);
+
/**
* struct zynq_platform_data - zynq gpio platform data structure
* @label: string to store in gpio->label
@@ -298,6 +301,7 @@ static int zynq_gpio_dir_in(struct gpio_chip *chip, unsigned int pin)
u32 reg;
unsigned int bank_num, bank_pin_num;
struct zynq_gpio *gpio = gpiochip_get_data(chip);
+ unsigned long flags;
zynq_gpio_get_bank_pin(pin, &bank_num, &bank_pin_num, gpio);
@@ -309,10 +313,12 @@ static int zynq_gpio_dir_in(struct gpio_chip *chip, unsigned int pin)
(bank_pin_num == 7 || bank_pin_num == 8))
return -EINVAL;
+ raw_spin_lock_irqsave(&zynq_gpio_lock, flags);
/* clear the bit in direction mode reg to set the pin as input */
reg = readl_relaxed(gpio->base_addr + ZYNQ_GPIO_DIRM_OFFSET(bank_num));
reg &= ~BIT(bank_pin_num);
writel_relaxed(reg, gpio->base_addr + ZYNQ_GPIO_DIRM_OFFSET(bank_num));
+ raw_spin_unlock_irqrestore(&zynq_gpio_lock, flags);
return 0;
}
@@ -335,9 +341,11 @@ static int zynq_gpio_dir_out(struct gpio_chip *chip, unsigned int pin,
u32 reg;
unsigned int bank_num, bank_pin_num;
struct zynq_gpio *gpio = gpiochip_get_data(chip);
+ unsigned long flags;
zynq_gpio_get_bank_pin(pin, &bank_num, &bank_pin_num, gpio);
+ raw_spin_lock_irqsave(&zynq_gpio_lock, flags);
/* set the GPIO pin as output */
reg = readl_relaxed(gpio->base_addr + ZYNQ_GPIO_DIRM_OFFSET(bank_num));
reg |= BIT(bank_pin_num);
@@ -347,6 +355,7 @@ static int zynq_gpio_dir_out(struct gpio_chip *chip, unsigned int pin,
reg = readl_relaxed(gpio->base_addr + ZYNQ_GPIO_OUTEN_OFFSET(bank_num));
reg |= BIT(bank_pin_num);
writel_relaxed(reg, gpio->base_addr + ZYNQ_GPIO_OUTEN_OFFSET(bank_num));
+ raw_spin_unlock_irqrestore(&zynq_gpio_lock, flags);
/* set the state of the pin */
zynq_gpio_set_value(chip, pin, state);
@@ -388,11 +397,15 @@ static void zynq_gpio_irq_mask(struct irq_data *irq_data)
unsigned int device_pin_num, bank_num, bank_pin_num;
struct zynq_gpio *gpio =
gpiochip_get_data(irq_data_get_irq_chip_data(irq_data));
+ unsigned long flags;
device_pin_num = irq_data->hwirq;
zynq_gpio_get_bank_pin(device_pin_num, &bank_num, &bank_pin_num, gpio);
+ raw_spin_lock_irqsave(&zynq_gpio_lock, flags);
+ ipipe_lock_irq(irq_data->irq);
writel_relaxed(BIT(bank_pin_num),
gpio->base_addr + ZYNQ_GPIO_INTDIS_OFFSET(bank_num));
+ raw_spin_unlock_irqrestore(&zynq_gpio_lock, flags);
}
/**
@@ -409,11 +422,15 @@ static void zynq_gpio_irq_unmask(struct irq_data *irq_data)
unsigned int device_pin_num, bank_num, bank_pin_num;
struct zynq_gpio *gpio =
gpiochip_get_data(irq_data_get_irq_chip_data(irq_data));
+ unsigned long flags;
device_pin_num = irq_data->hwirq;
zynq_gpio_get_bank_pin(device_pin_num, &bank_num, &bank_pin_num, gpio);
+ raw_spin_lock_irqsave(&zynq_gpio_lock, flags);
writel_relaxed(BIT(bank_pin_num),
gpio->base_addr + ZYNQ_GPIO_INTEN_OFFSET(bank_num));
+ ipipe_unlock_irq(irq_data->irq);
+ raw_spin_unlock_irqrestore(&zynq_gpio_lock, flags);
}
/**
@@ -571,11 +588,47 @@ static void zynq_gpio_irq_relres(struct irq_data *d)
pm_runtime_put(chip->parent);
}
+#ifdef CONFIG_IPIPE
+
+static void zynq_gpio_hold_irq(struct irq_data *irq_data)
+{
+ unsigned int device_pin_num, bank_num, bank_pin_num;
+ struct zynq_gpio *gpio =
+ gpiochip_get_data(irq_data_get_irq_chip_data(irq_data));
+
+ device_pin_num = irq_data->hwirq;
+ zynq_gpio_get_bank_pin(device_pin_num, &bank_num, &bank_pin_num, gpio);
+ raw_spin_lock(&zynq_gpio_lock);
+ writel_relaxed(BIT(bank_pin_num),
+ gpio->base_addr + ZYNQ_GPIO_INTDIS_OFFSET(bank_num));
+ writel_relaxed(BIT(bank_pin_num),
+ gpio->base_addr + ZYNQ_GPIO_INTSTS_OFFSET(bank_num));
+ raw_spin_unlock(&zynq_gpio_lock);
+}
+
+static void zynq_gpio_release_irq(struct irq_data *irq_data)
+{
+ unsigned int device_pin_num, bank_num, bank_pin_num;
+ struct zynq_gpio *gpio =
+ gpiochip_get_data(irq_data_get_irq_chip_data(irq_data));
+
+device_pin_num = irq_data->hwirq;
+ zynq_gpio_get_bank_pin(device_pin_num, &bank_num, &bank_pin_num, gpio);
+ writel_relaxed(BIT(bank_pin_num),
+ gpio->base_addr + ZYNQ_GPIO_INTEN_OFFSET(bank_num));
+}
+
+#endif /* CONFIG_IPIPE */
+
/* irq chip descriptor */
static struct irq_chip zynq_gpio_level_irqchip = {
- .name = DRIVER_NAME,
+ .name = DRIVER_NAME "-level",
.irq_enable = zynq_gpio_irq_enable,
.irq_eoi = zynq_gpio_irq_ack,
+#ifdef CONFIG_IPIPE
+ .irq_hold = zynq_gpio_hold_irq,
+ .irq_release = zynq_gpio_release_irq,
+#endif
.irq_mask = zynq_gpio_irq_mask,
.irq_unmask = zynq_gpio_irq_unmask,
.irq_set_type = zynq_gpio_set_irq_type,
@@ -583,20 +636,24 @@ static struct irq_chip zynq_gpio_level_irqchip = {
.irq_request_resources = zynq_gpio_irq_reqres,
.irq_release_resources = zynq_gpio_irq_relres,
.flags = IRQCHIP_EOI_THREADED | IRQCHIP_EOI_IF_HANDLED |
- IRQCHIP_MASK_ON_SUSPEND,
+ IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE,
};
static struct irq_chip zynq_gpio_edge_irqchip = {
- .name = DRIVER_NAME,
+ .name = DRIVER_NAME "-edge",
.irq_enable = zynq_gpio_irq_enable,
+#ifdef CONFIG_IPIPE
+ .irq_mask_ack = zynq_gpio_hold_irq,
+#else
.irq_ack = zynq_gpio_irq_ack,
+#endif
.irq_mask = zynq_gpio_irq_mask,
.irq_unmask = zynq_gpio_irq_unmask,
.irq_set_type = zynq_gpio_set_irq_type,
.irq_set_wake = zynq_gpio_set_wake,
.irq_request_resources = zynq_gpio_irq_reqres,
.irq_release_resources = zynq_gpio_irq_relres,
- .flags = IRQCHIP_MASK_ON_SUSPEND,
+ .flags = IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE,
};
static void zynq_gpio_handle_bank_irq(struct zynq_gpio *gpio,
@@ -614,7 +671,7 @@ static void zynq_gpio_handle_bank_irq(struct zynq_gpio *gpio,
unsigned int gpio_irq;
gpio_irq = irq_find_mapping(irqdomain, offset + bank_offset);
- generic_handle_irq(gpio_irq);
+ ipipe_handle_demuxed_irq(gpio_irq);
}
}
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index 125f7bb67bee..39d6f1204164 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -2169,7 +2169,7 @@ static void ring_destroy(struct intel_engine_cs *engine)
kfree(engine);
}
-static void setup_irq(struct intel_engine_cs *engine)
+static void gt_setup_irq(struct intel_engine_cs *engine)
{
struct drm_i915_private *i915 = engine->i915;
@@ -2195,7 +2195,7 @@ static void setup_common(struct intel_engine_cs *engine)
/* gen8+ are only supported with execlists */
GEM_BUG_ON(INTEL_GEN(i915) >= 8);
- setup_irq(engine);
+ gt_setup_irq(engine);
engine->destroy = ring_destroy;
diff --git a/drivers/gpu/ipu-v3/ipu-common.c b/drivers/gpu/ipu-v3/ipu-common.c
index 528812bf84da..a0670db351ea 100644
--- a/drivers/gpu/ipu-v3/ipu-common.c
+++ b/drivers/gpu/ipu-v3/ipu-common.c
@@ -1081,7 +1081,7 @@ static void ipu_irq_handle(struct ipu_soc *ipu, const int *regs, int num_regs)
irq = irq_linear_revmap(ipu->domain,
regs[i] * 32 + bit);
if (irq)
- generic_handle_irq(irq);
+ ipipe_handle_demuxed_irq(irq);
}
}
}
@@ -1306,6 +1306,7 @@ static int ipu_irq_init(struct ipu_soc *ipu)
ct->chip.irq_ack = irq_gc_ack_set_bit;
ct->chip.irq_mask = irq_gc_mask_clr_bit;
ct->chip.irq_unmask = irq_gc_mask_set_bit;
+ ct->chip.flags = IRQCHIP_PIPELINE_SAFE;
ct->regs.ack = IPU_INT_STAT(i / 32);
ct->regs.mask = IPU_INT_CTRL(i / 32);
}
diff --git a/drivers/gpu/ipu-v3/ipu-prv.h b/drivers/gpu/ipu-v3/ipu-prv.h
index 291ac1bab66d..95edf23b95ef 100644
--- a/drivers/gpu/ipu-v3/ipu-prv.h
+++ b/drivers/gpu/ipu-v3/ipu-prv.h
@@ -170,7 +170,7 @@ struct ipu_soc {
struct device *dev;
const struct ipu_devtype *devtype;
enum ipuv3_type ipu_type;
- spinlock_t lock;
+ ipipe_spinlock_t lock;
struct mutex channel_lock;
struct list_head channels;
diff --git a/drivers/irqchip/irq-atmel-aic.c b/drivers/irqchip/irq-atmel-aic.c
deleted file mode 100644
index bb1ad451392f..000000000000
--- a/drivers/irqchip/irq-atmel-aic.c
+++ /dev/null
@@ -1,274 +0,0 @@
-/*
- * Atmel AT91 AIC (Advanced Interrupt Controller) driver
- *
- * Copyright (C) 2004 SAN People
- * Copyright (C) 2004 ATMEL
- * Copyright (C) Rick Bronson
- * Copyright (C) 2014 Free Electrons
- *
- * Author: Boris BREZILLON <boris.brezillon@free-electrons.com>
- *
- * This file is licensed under the terms of the GNU General Public
- * License version 2. This program is licensed "as is" without any
- * warranty of any kind, whether express or implied.
- */
-
-#include <linux/init.h>
-#include <linux/module.h>
-#include <linux/mm.h>
-#include <linux/bitmap.h>
-#include <linux/types.h>
-#include <linux/irq.h>
-#include <linux/irqchip.h>
-#include <linux/of.h>
-#include <linux/of_address.h>
-#include <linux/of_irq.h>
-#include <linux/irqdomain.h>
-#include <linux/err.h>
-#include <linux/slab.h>
-#include <linux/io.h>
-
-#include <asm/exception.h>
-#include <asm/mach/irq.h>
-
-#include "irq-atmel-aic-common.h"
-
-/* Number of irq lines managed by AIC */
-#define NR_AIC_IRQS 32
-
-#define AT91_AIC_SMR(n) ((n) * 4)
-
-#define AT91_AIC_SVR(n) (0x80 + ((n) * 4))
-#define AT91_AIC_IVR 0x100
-#define AT91_AIC_FVR 0x104
-#define AT91_AIC_ISR 0x108
-
-#define AT91_AIC_IPR 0x10c
-#define AT91_AIC_IMR 0x110
-#define AT91_AIC_CISR 0x114
-
-#define AT91_AIC_IECR 0x120
-#define AT91_AIC_IDCR 0x124
-#define AT91_AIC_ICCR 0x128
-#define AT91_AIC_ISCR 0x12c
-#define AT91_AIC_EOICR 0x130
-#define AT91_AIC_SPU 0x134
-#define AT91_AIC_DCR 0x138
-
-static struct irq_domain *aic_domain;
-
-static asmlinkage void __exception_irq_entry
-aic_handle(struct pt_regs *regs)
-{
- struct irq_domain_chip_generic *dgc = aic_domain->gc;
- struct irq_chip_generic *gc = dgc->gc[0];
- u32 irqnr;
- u32 irqstat;
-
- irqnr = irq_reg_readl(gc, AT91_AIC_IVR);
- irqstat = irq_reg_readl(gc, AT91_AIC_ISR);
-
- if (!irqstat)
- irq_reg_writel(gc, 0, AT91_AIC_EOICR);
- else
- handle_domain_irq(aic_domain, irqnr, regs);
-}
-
-static int aic_retrigger(struct irq_data *d)
-{
- struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
-
- /* Enable interrupt on AIC5 */
- irq_gc_lock(gc);
- irq_reg_writel(gc, d->mask, AT91_AIC_ISCR);
- irq_gc_unlock(gc);
-
- return 0;
-}
-
-static int aic_set_type(struct irq_data *d, unsigned type)
-{
- struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
- unsigned int smr;
- int ret;
-
- smr = irq_reg_readl(gc, AT91_AIC_SMR(d->hwirq));
- ret = aic_common_set_type(d, type, &smr);
- if (ret)
- return ret;
-
- irq_reg_writel(gc, smr, AT91_AIC_SMR(d->hwirq));
-
- return 0;
-}
-
-#ifdef CONFIG_PM
-static void aic_suspend(struct irq_data *d)
-{
- struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
-
- irq_gc_lock(gc);
- irq_reg_writel(gc, gc->mask_cache, AT91_AIC_IDCR);
- irq_reg_writel(gc, gc->wake_active, AT91_AIC_IECR);
- irq_gc_unlock(gc);
-}
-
-static void aic_resume(struct irq_data *d)
-{
- struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
-
- irq_gc_lock(gc);
- irq_reg_writel(gc, gc->wake_active, AT91_AIC_IDCR);
- irq_reg_writel(gc, gc->mask_cache, AT91_AIC_IECR);
- irq_gc_unlock(gc);
-}
-
-static void aic_pm_shutdown(struct irq_data *d)
-{
- struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
-
- irq_gc_lock(gc);
- irq_reg_writel(gc, 0xffffffff, AT91_AIC_IDCR);
- irq_reg_writel(gc, 0xffffffff, AT91_AIC_ICCR);
- irq_gc_unlock(gc);
-}
-#else
-#define aic_suspend NULL
-#define aic_resume NULL
-#define aic_pm_shutdown NULL
-#endif /* CONFIG_PM */
-
-static void __init aic_hw_init(struct irq_domain *domain)
-{
- struct irq_chip_generic *gc = irq_get_domain_generic_chip(domain, 0);
- int i;
-
- /*
- * Perform 8 End Of Interrupt Command to make sure AIC
- * will not Lock out nIRQ
- */
- for (i = 0; i < 8; i++)
- irq_reg_writel(gc, 0, AT91_AIC_EOICR);
-
- /*
- * Spurious Interrupt ID in Spurious Vector Register.
- * When there is no current interrupt, the IRQ Vector Register
- * reads the value stored in AIC_SPU
- */
- irq_reg_writel(gc, 0xffffffff, AT91_AIC_SPU);
-
- /* No debugging in AIC: Debug (Protect) Control Register */
- irq_reg_writel(gc, 0, AT91_AIC_DCR);
-
- /* Disable and clear all interrupts initially */
- irq_reg_writel(gc, 0xffffffff, AT91_AIC_IDCR);
- irq_reg_writel(gc, 0xffffffff, AT91_AIC_ICCR);
-
- for (i = 0; i < 32; i++)
- irq_reg_writel(gc, i, AT91_AIC_SVR(i));
-}
-
-static int aic_irq_domain_xlate(struct irq_domain *d,
- struct device_node *ctrlr,
- const u32 *intspec, unsigned int intsize,
- irq_hw_number_t *out_hwirq,
- unsigned int *out_type)
-{
- struct irq_domain_chip_generic *dgc = d->gc;
- struct irq_chip_generic *gc;
- unsigned long flags;
- unsigned smr;
- int idx;
- int ret;
-
- if (!dgc)
- return -EINVAL;
-
- ret = aic_common_irq_domain_xlate(d, ctrlr, intspec, intsize,
- out_hwirq, out_type);
- if (ret)
- return ret;
-
- idx = intspec[0] / dgc->irqs_per_chip;
- if (idx >= dgc->num_chips)
- return -EINVAL;
-
- gc = dgc->gc[idx];
-
- irq_gc_lock_irqsave(gc, flags);
- smr = irq_reg_readl(gc, AT91_AIC_SMR(*out_hwirq));
- aic_common_set_priority(intspec[2], &smr);
- irq_reg_writel(gc, smr, AT91_AIC_SMR(*out_hwirq));
- irq_gc_unlock_irqrestore(gc, flags);
-
- return ret;
-}
-
-static const struct irq_domain_ops aic_irq_ops = {
- .map = irq_map_generic_chip,
- .xlate = aic_irq_domain_xlate,
-};
-
-static void __init at91rm9200_aic_irq_fixup(void)
-{
- aic_common_rtc_irq_fixup();
-}
-
-static void __init at91sam9260_aic_irq_fixup(void)
-{
- aic_common_rtt_irq_fixup();
-}
-
-static void __init at91sam9g45_aic_irq_fixup(void)
-{
- aic_common_rtc_irq_fixup();
- aic_common_rtt_irq_fixup();
-}
-
-static const struct of_device_id aic_irq_fixups[] __initconst = {
- { .compatible = "atmel,at91rm9200", .data = at91rm9200_aic_irq_fixup },
- { .compatible = "atmel,at91sam9g45", .data = at91sam9g45_aic_irq_fixup },
- { .compatible = "atmel,at91sam9n12", .data = at91rm9200_aic_irq_fixup },
- { .compatible = "atmel,at91sam9rl", .data = at91sam9g45_aic_irq_fixup },
- { .compatible = "atmel,at91sam9x5", .data = at91rm9200_aic_irq_fixup },
- { .compatible = "atmel,at91sam9260", .data = at91sam9260_aic_irq_fixup },
- { .compatible = "atmel,at91sam9261", .data = at91sam9260_aic_irq_fixup },
- { .compatible = "atmel,at91sam9263", .data = at91sam9260_aic_irq_fixup },
- { .compatible = "atmel,at91sam9g20", .data = at91sam9260_aic_irq_fixup },
- { /* sentinel */ },
-};
-
-static int __init aic_of_init(struct device_node *node,
- struct device_node *parent)
-{
- struct irq_chip_generic *gc;
- struct irq_domain *domain;
-
- if (aic_domain)
- return -EEXIST;
-
- domain = aic_common_of_init(node, &aic_irq_ops, "atmel-aic",
- NR_AIC_IRQS, aic_irq_fixups);
- if (IS_ERR(domain))
- return PTR_ERR(domain);
-
- aic_domain = domain;
- gc = irq_get_domain_generic_chip(domain, 0);
-
- gc->chip_types[0].regs.eoi = AT91_AIC_EOICR;
- gc->chip_types[0].regs.enable = AT91_AIC_IECR;
- gc->chip_types[0].regs.disable = AT91_AIC_IDCR;
- gc->chip_types[0].chip.irq_mask = irq_gc_mask_disable_reg;
- gc->chip_types[0].chip.irq_unmask = irq_gc_unmask_enable_reg;
- gc->chip_types[0].chip.irq_retrigger = aic_retrigger;
- gc->chip_types[0].chip.irq_set_type = aic_set_type;
- gc->chip_types[0].chip.irq_suspend = aic_suspend;
- gc->chip_types[0].chip.irq_resume = aic_resume;
- gc->chip_types[0].chip.irq_pm_shutdown = aic_pm_shutdown;
-
- aic_hw_init(domain);
- set_handle_irq(aic_handle);
-
- return 0;
-}
-IRQCHIP_DECLARE(at91rm9200_aic, "atmel,at91rm9200-aic", aic_of_init);
diff --git a/drivers/irqchip/irq-atmel-aic5.c b/drivers/irqchip/irq-atmel-aic5.c
index 29333497ba10..b902762126b7 100644
--- a/drivers/irqchip/irq-atmel-aic5.c
+++ b/drivers/irqchip/irq-atmel-aic5.c
@@ -80,7 +80,7 @@ aic5_handle(struct pt_regs *regs)
if (!irqstat)
irq_reg_writel(bgc, 0, AT91_AIC5_EOICR);
else
- handle_domain_irq(aic5_domain, irqnr, regs);
+ ipipe_handle_domain_irq(aic5_domain, irqnr, regs);
}
static void aic5_mask(struct irq_data *d)
@@ -88,16 +88,18 @@ static void aic5_mask(struct irq_data *d)
struct irq_domain *domain = d->domain;
struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ unsigned long flags;
/*
* Disable interrupt on AIC5. We always take the lock of the
* first irq chip as all chips share the same registers.
*/
- irq_gc_lock(bgc);
+ flags = irq_gc_lock(bgc);
+ ipipe_lock_irq(d->irq);
irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
irq_reg_writel(gc, 1, AT91_AIC5_IDCR);
gc->mask_cache &= ~d->mask;
- irq_gc_unlock(bgc);
+ irq_gc_unlock(bgc, flags);
}
static void aic5_unmask(struct irq_data *d)
@@ -105,28 +107,59 @@ static void aic5_unmask(struct irq_data *d)
struct irq_domain *domain = d->domain;
struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ unsigned long flags;
/*
* Enable interrupt on AIC5. We always take the lock of the
* first irq chip as all chips share the same registers.
*/
- irq_gc_lock(bgc);
+ flags = irq_gc_lock(bgc);
irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
irq_reg_writel(gc, 1, AT91_AIC5_IECR);
gc->mask_cache |= d->mask;
- irq_gc_unlock(bgc);
+ ipipe_unlock_irq(d->irq);
+ irq_gc_unlock(bgc, flags);
+}
+
+#ifdef CONFIG_IPIPE
+
+static void aic5_hold(struct irq_data *d)
+{
+ struct irq_domain *domain = d->domain;
+ struct irq_domain_chip_generic *dgc = domain->gc;
+ struct irq_chip_generic *gc = dgc->gc[0];
+
+ irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
+ irq_reg_writel(gc, 1, AT91_AIC5_IDCR);
+ irq_reg_writel(gc, 0, AT91_AIC5_EOICR);
+}
+
+static void aic5_release(struct irq_data *d)
+{
+ struct irq_domain *domain = d->domain;
+ struct irq_domain_chip_generic *dgc = domain->gc;
+ struct irq_chip_generic *gc = dgc->gc[0];
+ unsigned long flags;
+
+ flags = irq_gc_lock(gc);
+ irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
+ irq_reg_writel(gc, 1, AT91_AIC5_IECR);
+ irq_gc_unlock(gc, flags);
}
+#endif
+
static int aic5_retrigger(struct irq_data *d)
{
struct irq_domain *domain = d->domain;
struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
+ unsigned long flags;
/* Enable interrupt on AIC5 */
- irq_gc_lock(bgc);
+ flags = irq_gc_lock(bgc);
irq_reg_writel(bgc, d->hwirq, AT91_AIC5_SSR);
irq_reg_writel(bgc, 1, AT91_AIC5_ISCR);
- irq_gc_unlock(bgc);
+ irq_gc_unlock(bgc, flags);
return 0;
}
@@ -135,16 +168,17 @@ static int aic5_set_type(struct irq_data *d, unsigned type)
{
struct irq_domain *domain = d->domain;
struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
+ unsigned long flags;
unsigned int smr;
int ret;
- irq_gc_lock(bgc);
+ flags = irq_gc_lock(bgc);
irq_reg_writel(bgc, d->hwirq, AT91_AIC5_SSR);
smr = irq_reg_readl(bgc, AT91_AIC5_SMR);
ret = aic_common_set_type(d, type, &smr);
if (!ret)
irq_reg_writel(bgc, smr, AT91_AIC5_SMR);
- irq_gc_unlock(bgc);
+ irq_gc_unlock(bgc, flags);
return ret;
}
@@ -160,6 +194,7 @@ static void aic5_suspend(struct irq_data *d)
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
int i;
u32 mask;
+ unsigned long flags;
if (smr_cache)
for (i = 0; i < domain->revmap_size; i++) {
@@ -167,7 +202,7 @@ static void aic5_suspend(struct irq_data *d)
smr_cache[i] = irq_reg_readl(bgc, AT91_AIC5_SMR);
}
- irq_gc_lock(bgc);
+ flags = irq_gc_lock(bgc);
for (i = 0; i < dgc->irqs_per_chip; i++) {
mask = 1 << i;
if ((mask & gc->mask_cache) == (mask & gc->wake_active))
@@ -179,7 +214,7 @@ static void aic5_suspend(struct irq_data *d)
else
irq_reg_writel(bgc, 1, AT91_AIC5_IDCR);
}
- irq_gc_unlock(bgc);
+ irq_gc_unlock(bgc, flags);
}
static void aic5_resume(struct irq_data *d)
@@ -190,8 +225,9 @@ static void aic5_resume(struct irq_data *d)
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
int i;
u32 mask;
+ unsigned long flags;
- irq_gc_lock(bgc);
+ flags = irq_gc_lock(bgc);
if (smr_cache) {
irq_reg_writel(bgc, 0xffffffff, AT91_AIC5_SPU);
@@ -215,7 +251,7 @@ static void aic5_resume(struct irq_data *d)
else
irq_reg_writel(bgc, 1, AT91_AIC5_IDCR);
}
- irq_gc_unlock(bgc);
+ irq_gc_unlock(bgc, flags);
}
static void aic5_pm_shutdown(struct irq_data *d)
@@ -224,15 +260,16 @@ static void aic5_pm_shutdown(struct irq_data *d)
struct irq_domain_chip_generic *dgc = domain->gc;
struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ unsigned long flags;
int i;
- irq_gc_lock(bgc);
+ flags = irq_gc_lock(bgc);
for (i = 0; i < dgc->irqs_per_chip; i++) {
irq_reg_writel(bgc, i + gc->irq_base, AT91_AIC5_SSR);
irq_reg_writel(bgc, 1, AT91_AIC5_IDCR);
irq_reg_writel(bgc, 1, AT91_AIC5_ICCR);
}
- irq_gc_unlock(bgc);
+ irq_gc_unlock(bgc, flags);
}
#else
#define aic5_suspend NULL
@@ -350,6 +387,11 @@ static int __init aic5_of_init(struct device_node *node,
gc->chip_types[0].chip.irq_suspend = aic5_suspend;
gc->chip_types[0].chip.irq_resume = aic5_resume;
gc->chip_types[0].chip.irq_pm_shutdown = aic5_pm_shutdown;
+#ifdef CONFIG_IPIPE
+ gc->chip_types[0].chip.irq_hold = aic5_hold;
+ gc->chip_types[0].chip.irq_release = aic5_release;
+ gc->chip_types[0].chip.flags = IRQCHIP_PIPELINE_SAFE;
+#endif
}
aic5_hw_init(domain);
diff --git a/drivers/irqchip/irq-bcm2835.c b/drivers/irqchip/irq-bcm2835.c
index 418245d31921..3930dafa5d12 100644
--- a/drivers/irqchip/irq-bcm2835.c
+++ b/drivers/irqchip/irq-bcm2835.c
@@ -101,7 +101,12 @@ static void armctrl_unmask_irq(struct irq_data *d)
static struct irq_chip armctrl_chip = {
.name = "ARMCTRL-level",
.irq_mask = armctrl_mask_irq,
- .irq_unmask = armctrl_unmask_irq
+ .irq_unmask = armctrl_unmask_irq,
+#ifdef CONFIG_IPIPE
+ .irq_hold = armctrl_mask_irq,
+ .irq_release = armctrl_unmask_irq,
+#endif
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static int armctrl_xlate(struct irq_domain *d, struct device_node *ctrlr,
@@ -231,7 +236,7 @@ static void __exception_irq_entry bcm2835_handle_irq(
u32 hwirq;
while ((hwirq = get_next_armctrl_hwirq()) != ~0)
- handle_domain_irq(intc.domain, hwirq, regs);
+ ipipe_handle_domain_irq(intc.domain, hwirq, regs);
}
static void bcm2836_chained_handle_irq(struct irq_desc *desc)
@@ -239,7 +244,7 @@ static void bcm2836_chained_handle_irq(struct irq_desc *desc)
u32 hwirq;
while ((hwirq = get_next_armctrl_hwirq()) != ~0)
- generic_handle_irq(irq_linear_revmap(intc.domain, hwirq));
+ ipipe_handle_demuxed_irq(irq_linear_revmap(intc.domain, hwirq));
}
IRQCHIP_DECLARE(bcm2835_armctrl_ic, "brcm,bcm2835-armctrl-ic",
diff --git a/drivers/irqchip/irq-bcm2836.c b/drivers/irqchip/irq-bcm2836.c
index 2038693f074c..330ff7925b51 100644
--- a/drivers/irqchip/irq-bcm2836.c
+++ b/drivers/irqchip/irq-bcm2836.c
@@ -39,40 +39,68 @@ static void bcm2836_arm_irqchip_unmask_per_cpu_irq(unsigned int reg_offset,
writel(readl(reg) | BIT(bit), reg);
}
-static void bcm2836_arm_irqchip_mask_timer_irq(struct irq_data *d)
+static void __bcm2836_arm_irqchip_mask_timer_irq(struct irq_data *d)
{
bcm2836_arm_irqchip_mask_per_cpu_irq(LOCAL_TIMER_INT_CONTROL0,
d->hwirq - LOCAL_IRQ_CNTPSIRQ,
- smp_processor_id());
+ raw_smp_processor_id());
}
-static void bcm2836_arm_irqchip_unmask_timer_irq(struct irq_data *d)
+static void bcm2836_arm_irqchip_mask_timer_irq(struct irq_data *d)
+{
+ unsigned long flags;
+
+ flags = hard_local_irq_save();
+ __bcm2836_arm_irqchip_mask_timer_irq(d);
+ hard_local_irq_restore(flags);
+}
+
+static void __bcm2836_arm_irqchip_unmask_timer_irq(struct irq_data *d)
{
bcm2836_arm_irqchip_unmask_per_cpu_irq(LOCAL_TIMER_INT_CONTROL0,
d->hwirq - LOCAL_IRQ_CNTPSIRQ,
- smp_processor_id());
+ raw_smp_processor_id());
+}
+
+static void bcm2836_arm_irqchip_unmask_timer_irq(struct irq_data *d)
+{
+ unsigned long flags;
+
+ flags = hard_local_irq_save();
+ __bcm2836_arm_irqchip_unmask_timer_irq(d);
+ hard_local_irq_restore(flags);
}
static struct irq_chip bcm2836_arm_irqchip_timer = {
.name = "bcm2836-timer",
.irq_mask = bcm2836_arm_irqchip_mask_timer_irq,
.irq_unmask = bcm2836_arm_irqchip_unmask_timer_irq,
+#ifdef CONFIG_IPIPE
+ .irq_hold = __bcm2836_arm_irqchip_mask_timer_irq,
+ .irq_release = __bcm2836_arm_irqchip_unmask_timer_irq,
+#endif
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static void bcm2836_arm_irqchip_mask_pmu_irq(struct irq_data *d)
{
- writel(1 << smp_processor_id(), intc.base + LOCAL_PM_ROUTING_CLR);
+ writel(1 << raw_smp_processor_id(), intc.base + LOCAL_PM_ROUTING_CLR);
}
static void bcm2836_arm_irqchip_unmask_pmu_irq(struct irq_data *d)
{
- writel(1 << smp_processor_id(), intc.base + LOCAL_PM_ROUTING_SET);
+ writel(1 << raw_smp_processor_id(), intc.base + LOCAL_PM_ROUTING_SET);
}
static struct irq_chip bcm2836_arm_irqchip_pmu = {
.name = "bcm2836-pmu",
.irq_mask = bcm2836_arm_irqchip_mask_pmu_irq,
.irq_unmask = bcm2836_arm_irqchip_unmask_pmu_irq,
+#ifdef CONFIG_IPIPE
+ .irq_hold = bcm2836_arm_irqchip_mask_pmu_irq,
+ .irq_release = bcm2836_arm_irqchip_unmask_pmu_irq,
+#endif
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static void bcm2836_arm_irqchip_mask_gpu_irq(struct irq_data *d)
@@ -87,6 +115,11 @@ static struct irq_chip bcm2836_arm_irqchip_gpu = {
.name = "bcm2836-gpu",
.irq_mask = bcm2836_arm_irqchip_mask_gpu_irq,
.irq_unmask = bcm2836_arm_irqchip_unmask_gpu_irq,
+#ifdef CONFIG_IPIPE
+ .irq_hold = bcm2836_arm_irqchip_mask_gpu_irq,
+ .irq_release = bcm2836_arm_irqchip_unmask_gpu_irq,
+#endif
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static int bcm2836_map(struct irq_domain *d, unsigned int irq,
@@ -123,7 +156,7 @@ static int bcm2836_map(struct irq_domain *d, unsigned int irq,
static void
__exception_irq_entry bcm2836_arm_irqchip_handle_irq(struct pt_regs *regs)
{
- int cpu = smp_processor_id();
+ int cpu = raw_smp_processor_id();
u32 stat;
stat = readl_relaxed(intc.base + LOCAL_IRQ_PENDING0 + 4 * cpu);
@@ -135,12 +168,12 @@ __exception_irq_entry bcm2836_arm_irqchip_handle_irq(struct pt_regs *regs)
u32 ipi = ffs(mbox_val) - 1;
writel(1 << ipi, mailbox0);
- handle_IPI(ipi, regs);
+ ipipe_handle_multi_ipi(ipi, regs);
#endif
} else if (stat) {
u32 hwirq = ffs(stat) - 1;
- handle_domain_irq(intc.domain, hwirq, regs);
+ ipipe_handle_domain_irq(intc.domain, hwirq, regs);
}
}
diff --git a/drivers/irqchip/irq-bcm7120-l2.c b/drivers/irqchip/irq-bcm7120-l2.c
index d308bbe6f528..d09aed711490 100644
--- a/drivers/irqchip/irq-bcm7120-l2.c
+++ b/drivers/irqchip/irq-bcm7120-l2.c
@@ -58,6 +58,7 @@ static void bcm7120_l2_intc_irq_handle(struct irq_desc *desc)
struct bcm7120_l2_intc_data *b = data->b;
struct irq_chip *chip = irq_desc_get_chip(desc);
unsigned int idx;
+ unsigned long flags;
chained_irq_enter(chip, desc);
@@ -68,11 +69,11 @@ static void bcm7120_l2_intc_irq_handle(struct irq_desc *desc)
unsigned long pending;
int hwirq;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
pending = irq_reg_readl(gc, b->stat_offset[idx]) &
gc->mask_cache &
data->irq_map_mask[idx];
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
for_each_set_bit(hwirq, &pending, IRQS_PER_WORD) {
generic_handle_irq(irq_find_mapping(b->domain,
@@ -87,22 +88,24 @@ static void bcm7120_l2_intc_suspend(struct irq_chip_generic *gc)
{
struct bcm7120_l2_intc_data *b = gc->private;
struct irq_chip_type *ct = gc->chip_types;
+ unsigned long flags;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
if (b->can_wake)
irq_reg_writel(gc, gc->mask_cache | gc->wake_active,
ct->regs.mask);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
static void bcm7120_l2_intc_resume(struct irq_chip_generic *gc)
{
struct irq_chip_type *ct = gc->chip_types;
+ unsigned long flags;
/* Restore the saved mask */
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
irq_reg_writel(gc, gc->mask_cache, ct->regs.mask);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
static int bcm7120_l2_intc_init_one(struct device_node *dn,
diff --git a/drivers/irqchip/irq-brcmstb-l2.c b/drivers/irqchip/irq-brcmstb-l2.c
index f803ecb6a0fa..0201b94aa9f4 100644
--- a/drivers/irqchip/irq-brcmstb-l2.c
+++ b/drivers/irqchip/irq-brcmstb-l2.c
@@ -123,7 +123,7 @@ static void brcmstb_l2_intc_suspend(struct irq_data *d)
struct brcmstb_l2_intc_data *b = gc->private;
unsigned long flags;
- irq_gc_lock_irqsave(gc, flags);
+ flags = irq_gc_lock(gc);
/* Save the current mask */
b->saved_mask = irq_reg_readl(gc, ct->regs.mask);
@@ -132,7 +132,7 @@ static void brcmstb_l2_intc_suspend(struct irq_data *d)
irq_reg_writel(gc, ~gc->wake_active, ct->regs.disable);
irq_reg_writel(gc, gc->wake_active, ct->regs.enable);
}
- irq_gc_unlock_irqrestore(gc, flags);
+ irq_gc_unlock(gc, flags);
}
static void brcmstb_l2_intc_resume(struct irq_data *d)
@@ -142,7 +142,7 @@ static void brcmstb_l2_intc_resume(struct irq_data *d)
struct brcmstb_l2_intc_data *b = gc->private;
unsigned long flags;
- irq_gc_lock_irqsave(gc, flags);
+ flags = irq_gc_lock(gc);
if (ct->chip.irq_ack) {
/* Clear unmasked non-wakeup interrupts */
irq_reg_writel(gc, ~b->saved_mask & ~gc->wake_active,
@@ -152,7 +152,7 @@ static void brcmstb_l2_intc_resume(struct irq_data *d)
/* Restore the saved mask */
irq_reg_writel(gc, b->saved_mask, ct->regs.disable);
irq_reg_writel(gc, ~b->saved_mask, ct->regs.enable);
- irq_gc_unlock_irqrestore(gc, flags);
+ irq_gc_unlock(gc, flags);
}
static int __init brcmstb_l2_intc_of_init(struct device_node *np,
diff --git a/drivers/irqchip/irq-crossbar.c b/drivers/irqchip/irq-crossbar.c
index a05a7501e107..9c43dc674335 100644
--- a/drivers/irqchip/irq-crossbar.c
+++ b/drivers/irqchip/irq-crossbar.c
@@ -12,6 +12,7 @@
#include <linux/of_address.h>
#include <linux/of_irq.h>
#include <linux/slab.h>
+#include <linux/ipipe.h>
#define IRQ_FREE -1
#define IRQ_RESERVED -2
@@ -65,10 +66,15 @@ static struct irq_chip crossbar_chip = {
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_set_type = irq_chip_set_type_parent,
.flags = IRQCHIP_MASK_ON_SUSPEND |
- IRQCHIP_SKIP_SET_WAKE,
+ IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_PIPELINE_SAFE,
#ifdef CONFIG_SMP
.irq_set_affinity = irq_chip_set_affinity_parent,
#endif
+#ifdef CONFIG_IPIPE
+ .irq_hold = irq_chip_hold_parent,
+ .irq_release = irq_chip_release_parent,
+#endif
};
static int allocate_gic_irq(struct irq_domain *domain, unsigned virq,
diff --git a/drivers/irqchip/irq-dw-apb-ictl.c b/drivers/irqchip/irq-dw-apb-ictl.c
index e4550e9c810b..01923eca46e5 100644
--- a/drivers/irqchip/irq-dw-apb-ictl.c
+++ b/drivers/irqchip/irq-dw-apb-ictl.c
@@ -17,6 +17,7 @@
#include <linux/irqchip/chained_irq.h>
#include <linux/of_address.h>
#include <linux/of_irq.h>
+#include <linux/ipipe.h>
#define APB_INT_ENABLE_L 0x00
#define APB_INT_ENABLE_H 0x04
@@ -42,7 +43,7 @@ static void dw_apb_ictl_handler(struct irq_desc *desc)
u32 hwirq = ffs(stat) - 1;
u32 virq = irq_find_mapping(d, gc->irq_base + hwirq);
- generic_handle_irq(virq);
+ ipipe_handle_demuxed_irq(virq);
stat &= ~(1 << hwirq);
}
}
@@ -55,11 +56,12 @@ static void dw_apb_ictl_resume(struct irq_data *d)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct irq_chip_type *ct = irq_data_get_chip_type(d);
+ unsigned long flags;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
writel_relaxed(~0, gc->reg_base + ct->regs.enable);
writel_relaxed(*ct->mask_cache, gc->reg_base + ct->regs.mask);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
#else
#define dw_apb_ictl_resume NULL
@@ -144,6 +146,7 @@ static int __init dw_apb_ictl_init(struct device_node *np,
gc->chip_types[0].chip.irq_mask = irq_gc_mask_set_bit;
gc->chip_types[0].chip.irq_unmask = irq_gc_mask_clr_bit;
gc->chip_types[0].chip.irq_resume = dw_apb_ictl_resume;
+ gc->chip_types[0].chip.flags |= IRQCHIP_PIPELINE_SAFE;
}
irq_set_chained_handler_and_data(irq, dw_apb_ictl_handler, domain);
diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index e88e75c22b6a..ca18130d9818 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -72,14 +72,22 @@ struct v2m_data {
static void gicv2m_mask_msi_irq(struct irq_data *d)
{
+ unsigned long flags;
+
+ flags = hard_cond_local_irq_save();
pci_msi_mask_irq(d);
irq_chip_mask_parent(d);
+ hard_cond_local_irq_restore(flags);
}
static void gicv2m_unmask_msi_irq(struct irq_data *d)
{
+ unsigned long flags;
+
+ flags = hard_cond_local_irq_save();
pci_msi_unmask_irq(d);
irq_chip_unmask_parent(d);
+ hard_cond_local_irq_restore(flags);
}
static struct irq_chip gicv2m_msi_irq_chip = {
@@ -88,6 +96,11 @@ static struct irq_chip gicv2m_msi_irq_chip = {
.irq_unmask = gicv2m_unmask_msi_irq,
.irq_eoi = irq_chip_eoi_parent,
.irq_write_msi_msg = pci_msi_domain_write_msg,
+#ifdef CONFIG_IPIPE
+ .irq_hold = irq_chip_hold_parent,
+ .irq_release = irq_chip_release_parent,
+#endif
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static struct msi_domain_info gicv2m_msi_domain_info = {
@@ -129,6 +142,11 @@ static struct irq_chip gicv2m_irq_chip = {
.irq_eoi = irq_chip_eoi_parent,
.irq_set_affinity = irq_chip_set_affinity_parent,
.irq_compose_msi_msg = gicv2m_compose_msi_msg,
+#ifdef CONFIG_IPIPE
+ .irq_hold = irq_chip_hold_parent,
+ .irq_release = irq_chip_release_parent,
+#endif
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static int gicv2m_irq_gic_domain_alloc(struct irq_domain *domain,
@@ -251,6 +269,7 @@ static bool is_msi_spi_valid(u32 base, u32 num)
static struct irq_chip gicv2m_pmsi_irq_chip = {
.name = "pMSI",
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static struct msi_domain_ops gicv2m_pmsi_ops = {
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 77a130c03223..ec7b60cccf2b 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -328,7 +328,12 @@ static void gic_poke_irq(struct irq_data *d, u32 offset)
static void gic_mask_irq(struct irq_data *d)
{
+ unsigned long flags;
+
+ flags = hard_cond_local_irq_save();
+ ipipe_lock_irq(d->irq);
gic_poke_irq(d, GICD_ICENABLER);
+ hard_cond_local_irq_restore(flags);
}
static void gic_eoimode1_mask_irq(struct irq_data *d)
@@ -348,7 +353,12 @@ static void gic_eoimode1_mask_irq(struct irq_data *d)
static void gic_unmask_irq(struct irq_data *d)
{
+ unsigned long flags;
+
+ flags = hard_cond_local_irq_save();
gic_poke_irq(d, GICD_ISENABLER);
+ ipipe_unlock_irq(d->irq);
+ hard_cond_local_irq_restore(flags);
}
static inline bool gic_supports_nmi(void)
@@ -520,6 +530,27 @@ static void gic_eoimode1_eoi_irq(struct irq_data *d)
gic_write_dir(gic_irq(d));
}
+#ifdef CONFIG_IPIPE
+static void gic_hold_irq(struct irq_data *d)
+{
+ struct irq_chip *chip = irq_data_get_irq_chip(d);
+
+ gic_poke_irq(d, GICD_ICENABLER);
+
+ if (chip->irq_eoi == gic_eoimode1_eoi_irq) {
+ if (irqd_is_forwarded_to_vcpu(d))
+ gic_poke_irq(d, GICD_ICACTIVER);
+ gic_eoimode1_eoi_irq(d);
+} else
+ gic_eoi_irq(d);
+}
+
+static void gic_release_irq(struct irq_data *d)
+{
+ gic_poke_irq(d, GICD_ISENABLER);
+}
+#endif /* CONFIG_IPIPE */
+
static int gic_set_type(struct irq_data *d, unsigned int type)
{
enum gic_intid_range range;
@@ -645,7 +676,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
else
isb();
- err = handle_domain_irq(gic_data.domain, irqnr, regs);
+ err = ipipe_handle_domain_irq(gic_data.domain, irqnr, regs);
if (err) {
WARN_ONCE(true, "Unexpected interrupt received!\n");
gic_deactivate_unhandled(irqnr);
@@ -664,7 +695,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
* that any shared data read by handle_IPI will
* be read after the ACK.
*/
- handle_IPI(irqnr, regs);
+ pipe_handle_multi_ipi(irqnr, regs);
#else
WARN_ONCE(true, "Unexpected SGI received!\n");
#endif
@@ -1208,6 +1239,10 @@ static struct irq_chip gic_chip = {
.irq_unmask = gic_unmask_irq,
.irq_eoi = gic_eoi_irq,
.irq_set_type = gic_set_type,
+#ifdef CONFIG_IPIPE
+ .irq_hold = gic_hold_irq,
+ .irq_release = gic_release_irq,
+#endif
.irq_set_affinity = gic_set_affinity,
.irq_get_irqchip_state = gic_irq_get_irqchip_state,
.irq_set_irqchip_state = gic_irq_set_irqchip_state,
@@ -1215,6 +1250,7 @@ static struct irq_chip gic_chip = {
.irq_nmi_teardown = gic_irq_nmi_teardown,
.flags = IRQCHIP_SET_TYPE_MASKED |
IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_PIPELINE_SAFE |
IRQCHIP_MASK_ON_SUSPEND,
};
@@ -1224,6 +1260,10 @@ static struct irq_chip gic_eoimode1_chip = {
.irq_unmask = gic_unmask_irq,
.irq_eoi = gic_eoimode1_eoi_irq,
.irq_set_type = gic_set_type,
+#ifdef CONFIG_IPIPE
+ .irq_hold = gic_hold_irq,
+ .irq_release = gic_release_irq,
+#endif
.irq_set_affinity = gic_set_affinity,
.irq_get_irqchip_state = gic_irq_get_irqchip_state,
.irq_set_irqchip_state = gic_irq_set_irqchip_state,
@@ -1232,6 +1272,7 @@ static struct irq_chip gic_eoimode1_chip = {
.irq_nmi_teardown = gic_irq_nmi_teardown,
.flags = IRQCHIP_SET_TYPE_MASKED |
IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_PIPELINE_SAFE |
IRQCHIP_MASK_ON_SUSPEND,
};
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 882204d1ef4f..156e94ee1158 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -35,6 +35,7 @@
#include <linux/interrupt.h>
#include <linux/percpu.h>
#include <linux/slab.h>
+#include <linux/ipipe.h>
#include <linux/irqchip.h>
#include <linux/irqchip/chained_irq.h>
#include <linux/irqchip/arm-gic.h>
@@ -88,9 +89,17 @@ struct gic_chip_data {
#endif
};
+#ifdef CONFIG_IPIPE
+#define pipeline_lock(__flags) do { (__flags) = hard_local_irq_save(); } while (0)
+#define pipeline_unlock(__flags) hard_local_irq_restore(__flags)
+#else
+#define pipeline_lock(__flags) do { (void)__flags; } while (0)
+#define pipeline_unlock(__flags) do { (void)__flags; } while (0)
+#endif
+
#ifdef CONFIG_BL_SWITCHER
-static DEFINE_RAW_SPINLOCK(cpu_map_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(cpu_map_lock);
#define gic_lock_irqsave(f) \
raw_spin_lock_irqsave(&cpu_map_lock, (f))
@@ -201,7 +210,12 @@ static int gic_peek_irq(struct irq_data *d, u32 offset)
static void gic_mask_irq(struct irq_data *d)
{
+ unsigned long flags;
+
+ pipeline_lock(flags);
+ ipipe_lock_irq(d->irq);
gic_poke_irq(d, GIC_DIST_ENABLE_CLEAR);
+ pipeline_unlock(flags);
}
static void gic_eoimode1_mask_irq(struct irq_data *d)
@@ -221,7 +235,12 @@ static void gic_eoimode1_mask_irq(struct irq_data *d)
static void gic_unmask_irq(struct irq_data *d)
{
+ unsigned long flags;
+
+ pipeline_lock(flags);
gic_poke_irq(d, GIC_DIST_ENABLE_SET);
+ ipipe_unlock_irq(d->irq);
+ pipeline_unlock(flags);
}
static void gic_eoi_irq(struct irq_data *d)
@@ -238,6 +257,27 @@ static void gic_eoimode1_eoi_irq(struct irq_data *d)
writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_DEACTIVATE);
}
+#ifdef CONFIG_IPIPE
+static void gic_hold_irq(struct irq_data *d)
+{
+ struct irq_chip *chip = irq_data_get_irq_chip(d);
+
+ gic_poke_irq(d, GIC_DIST_ENABLE_CLEAR);
+
+ if (chip->irq_eoi == gic_eoimode1_eoi_irq) {
+ if (irqd_is_forwarded_to_vcpu(d))
+ gic_poke_irq(d, GIC_DIST_ACTIVE_CLEAR);
+ gic_eoimode1_eoi_irq(d);
+ } else
+ gic_eoi_irq(d);
+}
+
+static void gic_release_irq(struct irq_data *d)
+{
+ gic_poke_irq(d, GIC_DIST_ENABLE_SET);
+}
+#endif /* CONFIG_IPIPE */
+
static int gic_irq_set_irqchip_state(struct irq_data *d,
enum irqchip_irq_state which, bool val)
{
@@ -361,7 +401,7 @@ static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
if (static_branch_likely(&supports_deactivate_key))
writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
isb();
- handle_domain_irq(gic->domain, irqnr, regs);
+ ipipe_handle_domain_irq(gic->domain, irqnr, regs);
continue;
}
if (irqnr < 16) {
@@ -377,7 +417,7 @@ static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
* Pairs with the write barrier in gic_raise_softirq
*/
smp_rmb();
- handle_IPI(irqnr, regs);
+ ipipe_handle_multi_ipi(irqnr, regs);
#endif
continue;
}
@@ -405,7 +445,7 @@ static void gic_handle_cascade_irq(struct irq_desc *desc)
handle_bad_irq(desc);
} else {
isb();
- generic_handle_irq(cascade_irq);
+ ipipe_handle_demuxed_irq(cascade_irq);
}
out:
@@ -417,11 +457,16 @@ static const struct irq_chip gic_chip = {
.irq_unmask = gic_unmask_irq,
.irq_eoi = gic_eoi_irq,
.irq_set_type = gic_set_type,
+#ifdef CONFIG_IPIPE
+ .irq_hold = gic_hold_irq,
+ .irq_release = gic_release_irq,
+#endif
.irq_get_irqchip_state = gic_irq_get_irqchip_state,
.irq_set_irqchip_state = gic_irq_set_irqchip_state,
.flags = IRQCHIP_SET_TYPE_MASKED |
IRQCHIP_SKIP_SET_WAKE |
- IRQCHIP_MASK_ON_SUSPEND,
+ IRQCHIP_MASK_ON_SUSPEND |
+ IRQCHIP_PIPELINE_SAFE,
};
void __init gic_cascade_irq(unsigned int gic_nr, unsigned int irq)
@@ -479,7 +524,6 @@ static void gic_cpu_if_up(struct gic_chip_data *gic)
writel_relaxed(bypass | mode | GICC_ENABLE, cpu_base + GIC_CPU_CTRL);
}
-
static void gic_dist_init(struct gic_chip_data *gic)
{
unsigned int i;
diff --git a/drivers/irqchip/irq-imx-gpcv2.c b/drivers/irqchip/irq-imx-gpcv2.c
index 4f74c15c4755..422fa2198e07 100644
--- a/drivers/irqchip/irq-imx-gpcv2.c
+++ b/drivers/irqchip/irq-imx-gpcv2.c
@@ -7,6 +7,7 @@
#include <linux/of_irq.h>
#include <linux/slab.h>
#include <linux/irqchip.h>
+#include <linux/ipipe.h>
#include <linux/syscore_ops.h>
#define IMR_NUM 4
@@ -19,7 +20,11 @@
struct gpcv2_irqchip_data {
+#ifdef CONFIG_IPIPE
+ ipipe_spinlock_t rlock;
+#else
struct raw_spinlock rlock;
+#endif
void __iomem *gpc_base;
u32 wakeup_sources[IMR_NUM];
u32 saved_irq_mask[IMR_NUM];
@@ -36,6 +41,7 @@ static void __iomem *gpcv2_idx_to_reg(struct gpcv2_irqchip_data *cd, int i)
static int gpcv2_wakeup_source_save(void)
{
struct gpcv2_irqchip_data *cd;
+ unsigned long flags;
void __iomem *reg;
int i;
@@ -45,8 +51,10 @@ static int gpcv2_wakeup_source_save(void)
for (i = 0; i < IMR_NUM; i++) {
reg = gpcv2_idx_to_reg(cd, i);
+ flags = hard_cond_local_irq_save();
cd->saved_irq_mask[i] = readl_relaxed(reg);
writel_relaxed(cd->wakeup_sources[i], reg);
+ hard_cond_local_irq_restore(flags);
}
return 0;
@@ -55,14 +63,18 @@ static int gpcv2_wakeup_source_save(void)
static void gpcv2_wakeup_source_restore(void)
{
struct gpcv2_irqchip_data *cd;
+ unsigned long flags;
int i;
cd = imx_gpcv2_instance;
if (!cd)
return;
- for (i = 0; i < IMR_NUM; i++)
+ for (i = 0; i < IMR_NUM; i++) {
+ flags = hard_cond_local_irq_save();
writel_relaxed(cd->saved_irq_mask[i], gpcv2_idx_to_reg(cd, i));
+ hard_cond_local_irq_restore(flags);
+ }
}
static struct syscore_ops imx_gpcv2_syscore_ops = {
@@ -92,38 +104,81 @@ static int imx_gpcv2_irq_set_wake(struct irq_data *d, unsigned int on)
return 0;
}
-static void imx_gpcv2_irq_unmask(struct irq_data *d)
+static void __imx_gpcv2_irq_unmask(struct irq_data *d)
{
struct gpcv2_irqchip_data *cd = d->chip_data;
void __iomem *reg;
u32 val;
- raw_spin_lock(&cd->rlock);
reg = gpcv2_idx_to_reg(cd, d->hwirq / 32);
val = readl_relaxed(reg);
val &= ~BIT(d->hwirq % 32);
writel_relaxed(val, reg);
- raw_spin_unlock(&cd->rlock);
irq_chip_unmask_parent(d);
}
-static void imx_gpcv2_irq_mask(struct irq_data *d)
+static void imx_gpcv2_irq_unmask(struct irq_data *d)
+{
+ struct gpcv2_irqchip_data *cd = d->chip_data;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&cd->rlock, flags);
+ __imx_gpcv2_irq_unmask(d);
+ raw_spin_unlock_irqrestore(&cd->rlock, flags);
+ irq_chip_unmask_parent(d);
+}
+
+static void __imx_gpcv2_irq_mask(struct irq_data *d)
{
struct gpcv2_irqchip_data *cd = d->chip_data;
void __iomem *reg;
u32 val;
- raw_spin_lock(&cd->rlock);
reg = gpcv2_idx_to_reg(cd, d->hwirq / 32);
val = readl_relaxed(reg);
val |= BIT(d->hwirq % 32);
writel_relaxed(val, reg);
- raw_spin_unlock(&cd->rlock);
irq_chip_mask_parent(d);
}
+static void imx_gpcv2_irq_mask(struct irq_data *d)
+{
+ struct gpcv2_irqchip_data *cd = d->chip_data;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&cd->rlock, flags);
+ __imx_gpcv2_irq_mask(d);
+ raw_spin_unlock_irqrestore(&cd->rlock, flags);
+ irq_chip_mask_parent(d);
+}
+
+#ifdef CONFIG_IPIPE
+
+static void imx_gpc_hold_irq(struct irq_data *d)
+{
+ struct gpcv2_irqchip_data *cd = d->chip_data;
+
+ raw_spin_lock(&cd->rlock);
+ __imx_gpcv2_irq_mask(d);
+ raw_spin_unlock(&cd->rlock);
+ irq_chip_hold_parent(d);
+}
+
+static void imx_gpc_release_irq(struct irq_data *d)
+{
+ struct gpcv2_irqchip_data *cd = d->chip_data;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&cd->rlock, flags);
+ __imx_gpcv2_irq_unmask(d);
+ raw_spin_unlock_irqrestore(&cd->rlock, flags);
+ irq_chip_release_parent(d);
+}
+
+#endif /* CONFIG_IPIPE */
+
static struct irq_chip gpcv2_irqchip_data_chip = {
.name = "GPCv2",
.irq_eoi = irq_chip_eoi_parent,
@@ -135,6 +190,11 @@ static struct irq_chip gpcv2_irqchip_data_chip = {
#ifdef CONFIG_SMP
.irq_set_affinity = irq_chip_set_affinity_parent,
#endif
+#ifdef CONFIG_IPIPE
+ .irq_hold = imx_gpc_hold_irq,
+ .irq_release = imx_gpc_release_irq,
+#endif
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static int imx_gpcv2_domain_translate(struct irq_domain *d,
diff --git a/drivers/irqchip/irq-omap-intc.c b/drivers/irqchip/irq-omap-intc.c
index d360a6eddd6d..1fbcda774458 100644
--- a/drivers/irqchip/irq-omap-intc.c
+++ b/drivers/irqchip/irq-omap-intc.c
@@ -15,6 +15,7 @@
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/io.h>
+#include <asm/ipipe.h>
#include <asm/exception.h>
#include <linux/irqchip.h>
@@ -39,6 +40,7 @@
#define INTC_MIR_CLEAR0 0x0088
#define INTC_MIR_SET0 0x008c
#define INTC_PENDING_IRQ0 0x0098
+#define INTC_PRIO 0x0100
#define INTC_PENDING_IRQ1 0x00b8
#define INTC_PENDING_IRQ2 0x00d8
#define INTC_PENDING_IRQ3 0x00f8
@@ -49,6 +51,12 @@
#define INTCPS_NR_ILR_REGS 128
#define INTCPS_NR_MIR_REGS 4
+#if !defined(MULTI_OMAP1) && !defined(MULTI_OMAP2)
+#define inline_single inline
+#else
+#define inline_single
+#endif
+
#define INTC_IDLE_FUNCIDLE (1 << 0)
#define INTC_IDLE_TURBO (1 << 1)
@@ -69,12 +77,12 @@ static void __iomem *omap_irq_base;
static int omap_nr_pending;
static int omap_nr_irqs;
-static void intc_writel(u32 reg, u32 val)
+static inline_single void intc_writel(u32 reg, u32 val)
{
writel_relaxed(val, omap_irq_base + reg);
}
-static u32 intc_readl(u32 reg)
+static inline_single u32 intc_readl(u32 reg)
{
return readl_relaxed(omap_irq_base + reg);
}
@@ -137,9 +145,10 @@ void omap3_intc_resume_idle(void)
}
/* XXX: FIQ and additional INTC support (only MPU at the moment) */
-static void omap_ack_irq(struct irq_data *d)
+static inline_single void omap_ack_irq(struct irq_data *d)
{
intc_writel(INTC_CONTROL, 0x1);
+ dsb();
}
static void omap_mask_ack_irq(struct irq_data *d)
@@ -164,8 +173,14 @@ static void __init omap_irq_soft_reset(void)
while (!(intc_readl(INTC_SYSSTATUS) & 0x1))
/* Wait for reset to complete */;
+#ifndef CONFIG_IPIPE
/* Enable autoidle */
intc_writel(INTC_SYSCONFIG, 1 << 0);
+#else /* CONFIG_IPIPE */
+ /* Disable autoidle */
+ intc_writel(INTC_SYSCONFIG, 0);
+ intc_writel(INTC_IDLE, 0x1);
+#endif /* CONFIG_IPIPE */
}
int omap_irq_pending(void)
@@ -211,7 +226,7 @@ static int __init omap_alloc_gc_of(struct irq_domain *d, void __iomem *base)
ct->chip.irq_mask = irq_gc_mask_disable_reg;
ct->chip.irq_unmask = irq_gc_unmask_enable_reg;
- ct->chip.flags |= IRQCHIP_SKIP_SET_WAKE;
+ ct->chip.flags |= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE;
ct->regs.enable = INTC_MIR_CLEAR0 + 32 * i;
ct->regs.disable = INTC_MIR_SET0 + 32 * i;
@@ -231,8 +246,11 @@ static void __init omap_alloc_gc_legacy(void __iomem *base,
ct = gc->chip_types;
ct->chip.irq_ack = omap_mask_ack_irq;
ct->chip.irq_mask = irq_gc_mask_disable_reg;
+#ifdef CONFIG_IPIPE
+ ct->chip.irq_mask_ack = omap_mask_ack_irq;
+#endif
ct->chip.irq_unmask = irq_gc_unmask_enable_reg;
- ct->chip.flags |= IRQCHIP_SKIP_SET_WAKE;
+ ct->chip.flags |= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE;
ct->regs.enable = INTC_MIR_CLEAR0;
ct->regs.disable = INTC_MIR_SET0;
@@ -357,7 +375,7 @@ omap_intc_handle_irq(struct pt_regs *regs)
}
irqnr &= ACTIVEIRQ_MASK;
- handle_domain_irq(domain, irqnr, regs);
+ ipipe_handle_domain_irq(domain, irqnr, regs);
}
static int __init intc_of_init(struct device_node *node,
@@ -387,6 +405,28 @@ static int __init intc_of_init(struct device_node *node,
return 0;
}
+#if defined(CONFIG_IPIPE) && defined(CONFIG_ARCH_OMAP2PLUS)
+#if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM33XX)
+void omap3_intc_mute(void)
+{
+ intc_writel(INTC_THRESHOLD, 0x1);
+ intc_writel(INTC_CONTROL, 0x1);
+}
+
+void omap3_intc_unmute(void)
+{
+ intc_writel(INTC_THRESHOLD, 0xff);
+}
+
+void omap3_intc_set_irq_prio(int irq, int hi)
+{
+ if (irq >= INTCPS_NR_MIR_REGS * 32)
+ return;
+ intc_writel(INTC_PRIO + 4 * irq, hi ? 0 : 0xfc);
+}
+#endif /* CONFIG_ARCH_OMAP3 */
+#endif /* CONFIG_IPIPE && ARCH_OMAP2PLUS */
+
IRQCHIP_DECLARE(omap2_intc, "ti,omap2-intc", intc_of_init);
IRQCHIP_DECLARE(omap3_intc, "ti,omap3-intc", intc_of_init);
IRQCHIP_DECLARE(dm814x_intc, "ti,dm814-intc", intc_of_init);
diff --git a/drivers/irqchip/irq-sunxi-nmi.c b/drivers/irqchip/irq-sunxi-nmi.c
index a412b5d5d0fa..015056bf4004 100644
--- a/drivers/irqchip/irq-sunxi-nmi.c
+++ b/drivers/irqchip/irq-sunxi-nmi.c
@@ -115,8 +115,9 @@ static int sunxi_sc_nmi_set_type(struct irq_data *data, unsigned int flow_type)
u32 ctrl_off = ct->regs.type;
unsigned int src_type;
unsigned int i;
+ unsigned long flags;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
switch (flow_type & IRQF_TRIGGER_MASK) {
case IRQ_TYPE_EDGE_FALLING:
@@ -133,7 +134,7 @@ static int sunxi_sc_nmi_set_type(struct irq_data *data, unsigned int flow_type)
src_type = SUNXI_SRC_TYPE_LEVEL_LOW;
break;
default:
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
pr_err("Cannot assign multiple trigger modes to IRQ %d.\n",
data->irq);
return -EBADR;
@@ -151,7 +152,7 @@ static int sunxi_sc_nmi_set_type(struct irq_data *data, unsigned int flow_type)
src_type_reg |= src_type;
sunxi_sc_nmi_write(gc, ctrl_off, src_type_reg);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
return IRQ_SET_MASK_OK;
}
@@ -200,7 +201,7 @@ static int __init sunxi_sc_nmi_irq_init(struct device_node *node,
gc->chip_types[0].chip.irq_unmask = irq_gc_mask_set_bit;
gc->chip_types[0].chip.irq_eoi = irq_gc_ack_set_bit;
gc->chip_types[0].chip.irq_set_type = sunxi_sc_nmi_set_type;
- gc->chip_types[0].chip.flags = IRQCHIP_EOI_THREADED | IRQCHIP_EOI_IF_HANDLED;
+ gc->chip_types[0].chip.flags = IRQCHIP_EOI_THREADED | IRQCHIP_EOI_IF_HANDLED | IRQCHIP_PIPELINE_SAFE;
gc->chip_types[0].regs.ack = reg_offs->pend;
gc->chip_types[0].regs.mask = reg_offs->enable;
gc->chip_types[0].regs.type = reg_offs->ctrl;
@@ -211,6 +212,7 @@ static int __init sunxi_sc_nmi_irq_init(struct device_node *node,
gc->chip_types[1].chip.irq_mask = irq_gc_mask_clr_bit;
gc->chip_types[1].chip.irq_unmask = irq_gc_mask_set_bit;
gc->chip_types[1].chip.irq_set_type = sunxi_sc_nmi_set_type;
+ gc->chip_types[1].chip.flags = IRQCHIP_PIPELINE_SAFE;
gc->chip_types[1].regs.ack = reg_offs->pend;
gc->chip_types[1].regs.mask = reg_offs->enable;
gc->chip_types[1].regs.type = reg_offs->ctrl;
diff --git a/drivers/irqchip/irq-versatile-fpga.c b/drivers/irqchip/irq-versatile-fpga.c
index f1386733d3bc..78d3e441fb5d 100644
--- a/drivers/irqchip/irq-versatile-fpga.c
+++ b/drivers/irqchip/irq-versatile-fpga.c
@@ -85,7 +85,7 @@ static void fpga_irq_handle(struct irq_desc *desc)
unsigned int irq = ffs(status) - 1;
status &= ~(1 << irq);
- generic_handle_irq(irq_find_mapping(f->domain, irq));
+ ipipe_handle_demuxed_irq(irq_find_mapping(f->domain, irq));
} while (status);
out:
@@ -105,7 +105,7 @@ static int handle_one_fpga(struct fpga_irq_data *f, struct pt_regs *regs)
while ((status = readl(f->base + IRQ_STATUS))) {
irq = ffs(status) - 1;
- handle_domain_irq(f->domain, irq, regs);
+ ipipe_handle_domain_irq(f->domain, irq, regs);
handled = 1;
}
@@ -161,7 +161,11 @@ void __init fpga_irq_init(void __iomem *base, const char *name, int irq_start,
f->chip.name = name;
f->chip.irq_ack = fpga_irq_mask;
f->chip.irq_mask = fpga_irq_mask;
+#ifdef CONFIG_IPIPE
+ f->chip.irq_mask_ack = fpga_irq_mask;
+#endif
f->chip.irq_unmask = fpga_irq_unmask;
+ f->chip.flags = IRQCHIP_PIPELINE_SAFE;
f->valid = valid;
if (parent_irq != -1) {
diff --git a/drivers/irqchip/irq-vic.c b/drivers/irqchip/irq-vic.c
index f3f20a3cff50..9ed65876349f 100644
--- a/drivers/irqchip/irq-vic.c
+++ b/drivers/irqchip/irq-vic.c
@@ -21,6 +21,7 @@
#include <linux/device.h>
#include <linux/amba/bus.h>
#include <linux/irqchip/arm-vic.h>
+#include <linux/ipipe.h>
#include <asm/exception.h>
#include <asm/irq.h>
@@ -205,7 +206,7 @@ static int handle_one_vic(struct vic_device *vic, struct pt_regs *regs)
while ((stat = readl_relaxed(vic->base + VIC_IRQ_STATUS))) {
irq = ffs(stat) - 1;
- handle_domain_irq(vic->domain, irq, regs);
+ ipipe_handle_domain_irq(vic->domain, irq, regs);
handled = 1;
}
@@ -222,7 +223,7 @@ static void vic_handle_irq_cascaded(struct irq_desc *desc)
while ((stat = readl_relaxed(vic->base + VIC_IRQ_STATUS))) {
hwirq = ffs(stat) - 1;
- generic_handle_irq(irq_find_mapping(vic->domain, hwirq));
+ ipipe_handle_demuxed_irq(irq_find_mapping(vic->domain, hwirq));
}
chained_irq_exit(host_chip, desc);
@@ -326,7 +327,7 @@ static void vic_unmask_irq(struct irq_data *d)
#if defined(CONFIG_PM)
static struct vic_device *vic_from_irq(unsigned int irq)
{
- struct vic_device *v = vic_devices;
+ struct vic_device *v = vic_devices;
unsigned int base_irq = irq & ~31;
int id;
@@ -365,8 +366,12 @@ static struct irq_chip vic_chip = {
.name = "VIC",
.irq_ack = vic_ack_irq,
.irq_mask = vic_mask_irq,
+#ifdef CONFIG_IPIPE
+ .irq_mask_ack = vic_ack_irq,
+#endif /* CONFIG_IPIPE */
.irq_unmask = vic_unmask_irq,
.irq_set_wake = vic_set_wake,
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static void __init vic_disable(void __iomem *base)
diff --git a/drivers/memory/omap-gpmc.c b/drivers/memory/omap-gpmc.c
index 332ffd7cf8b0..27dd6a9a60fb 100644
--- a/drivers/memory/omap-gpmc.c
+++ b/drivers/memory/omap-gpmc.c
@@ -1262,12 +1262,15 @@ int gpmc_get_client_irq(unsigned irq_config)
static int gpmc_irq_endis(unsigned long hwirq, bool endis)
{
+ unsigned long flags;
u32 regval;
/* bits GPMC_NR_NAND_IRQS to 8 are reserved */
if (hwirq >= GPMC_NR_NAND_IRQS)
hwirq += 8 - GPMC_NR_NAND_IRQS;
+ flags = hard_local_irq_save();
+
regval = gpmc_read_reg(GPMC_IRQENABLE);
if (endis)
regval |= BIT(hwirq);
@@ -1275,6 +1278,8 @@ static int gpmc_irq_endis(unsigned long hwirq, bool endis)
regval &= ~BIT(hwirq);
gpmc_write_reg(GPMC_IRQENABLE, regval);
+ hard_local_irq_restore(flags);
+
return 0;
}
@@ -1300,6 +1305,7 @@ static void gpmc_irq_unmask(struct irq_data *d)
static void gpmc_irq_edge_config(unsigned long hwirq, bool rising_edge)
{
+ unsigned long flags;
u32 regval;
/* NAND IRQs polarity is not configurable */
@@ -1309,6 +1315,8 @@ static void gpmc_irq_edge_config(unsigned long hwirq, bool rising_edge)
/* WAITPIN starts at BIT 8 */
hwirq += 8 - GPMC_NR_NAND_IRQS;
+ flags = hard_local_irq_save();
+
regval = gpmc_read_reg(GPMC_CONFIG);
if (rising_edge)
regval &= ~BIT(hwirq);
@@ -1316,6 +1324,8 @@ static void gpmc_irq_edge_config(unsigned long hwirq, bool rising_edge)
regval |= BIT(hwirq);
gpmc_write_reg(GPMC_CONFIG, regval);
+
+ hard_local_irq_restore(flags);
}
static void gpmc_irq_ack(struct irq_data *d)
@@ -1395,7 +1405,7 @@ static irqreturn_t gpmc_handle_irq(int irq, void *data)
hwirq, virq);
}
- generic_handle_irq(virq);
+ ipipe_handle_demuxed_irq(virq);
}
}
@@ -1423,6 +1433,7 @@ static int gpmc_setup_irq(struct gpmc_device *gpmc)
gpmc->irq_chip.irq_mask = gpmc_irq_mask;
gpmc->irq_chip.irq_unmask = gpmc_irq_unmask;
gpmc->irq_chip.irq_set_type = gpmc_irq_set_type;
+ gpmc->irq_chip.flags |= IRQCHIP_PIPELINE_SAFE;
gpmc_irq_domain = irq_domain_add_linear(gpmc->dev->of_node,
gpmc->nirqs,
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index fbcb211cceb4..c9fd4e4966ba 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -66,6 +66,7 @@ static struct irq_chip dw_pcie_msi_irq_chip = {
.irq_ack = dw_msi_ack_irq,
.irq_mask = dw_msi_mask_irq,
.irq_unmask = dw_msi_unmask_irq,
+ .flags = IRQCHIP_PIPELINE_SAFE,
};
static struct msi_domain_info dw_pcie_msi_domain_info = {
diff --git a/drivers/pci/controller/pcie-altera.c b/drivers/pci/controller/pcie-altera.c
index d2497ca43828..9c5b64a59973 100644
--- a/drivers/pci/controller/pcie-altera.c
+++ b/drivers/pci/controller/pcie-altera.c
@@ -661,7 +661,7 @@ static void altera_pcie_isr(struct irq_desc *desc)
virq = irq_find_mapping(pcie->irq_domain, bit);
if (virq)
- generic_handle_irq(virq);
+ ipipe_handle_demuxed_irq(virq);
else
dev_err(dev, "unexpected IRQ, INT%d\n", bit);
}
diff --git a/drivers/pinctrl/bcm/pinctrl-bcm2835.c b/drivers/pinctrl/bcm/pinctrl-bcm2835.c
index e4da3217e939..0ad464f90444 100644
--- a/drivers/pinctrl/bcm/pinctrl-bcm2835.c
+++ b/drivers/pinctrl/bcm/pinctrl-bcm2835.c
@@ -18,6 +18,7 @@
#include <linux/io.h>
#include <linux/irq.h>
#include <linux/irqdesc.h>
+#include <linux/ipipe.h>
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/of_address.h>
@@ -88,7 +89,11 @@ struct bcm2835_pinctrl {
struct pinctrl_desc pctl_desc;
struct pinctrl_gpio_range gpio_range;
+#ifdef CONFIG_IPIPE
+ ipipe_spinlock_t irq_lock[BCM2835_NUM_BANKS];
+#else
raw_spinlock_t irq_lock[BCM2835_NUM_BANKS];
+#endif
};
/* pins are just named GPIO0..GPIO53 */
@@ -392,7 +397,7 @@ static void bcm2835_gpio_irq_handle_bank(struct bcm2835_pinctrl *pc,
events &= pc->enabled_irq_map[bank];
for_each_set_bit(offset, &events, 32) {
gpio = (32 * bank) + offset;
- generic_handle_irq(irq_linear_revmap(pc->gpio_chip.irq.domain,
+ ipipe_handle_demuxed_irq(irq_linear_revmap(pc->gpio_chip.irq.domain,
gpio));
}
}
@@ -492,6 +497,7 @@ static void bcm2835_gpio_irq_enable(struct irq_data *data)
raw_spin_lock_irqsave(&pc->irq_lock[bank], flags);
set_bit(offset, &pc->enabled_irq_map[bank]);
bcm2835_gpio_irq_config(pc, gpio, true);
+ ipipe_unlock_irq(data->irq);
raw_spin_unlock_irqrestore(&pc->irq_lock[bank], flags);
}
@@ -509,6 +515,7 @@ static void bcm2835_gpio_irq_disable(struct irq_data *data)
/* Clear events that were latched prior to clearing event sources */
bcm2835_gpio_set_bit(pc, GPEDS0, gpio);
clear_bit(offset, &pc->enabled_irq_map[bank]);
+ ipipe_lock_irq(data->irq);
raw_spin_unlock_irqrestore(&pc->irq_lock[bank], flags);
}
@@ -666,6 +673,39 @@ static int bcm2835_gpio_irq_set_wake(struct irq_data *data, unsigned int on)
return ret;
}
+#ifdef CONFIG_IPIPE
+
+static void bcm2835_gpio_irq_hold(struct irq_data *data)
+{
+ struct bcm2835_pinctrl *pc = irq_data_get_irq_chip_data(data);
+ unsigned gpio = irqd_to_hwirq(data);
+ unsigned offset = GPIO_REG_SHIFT(gpio);
+ unsigned bank = GPIO_REG_OFFSET(gpio);
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&pc->irq_lock[bank], flags);
+ bcm2835_gpio_irq_config(pc, gpio, false);
+ bcm2835_gpio_set_bit(pc, GPEDS0, gpio);
+ clear_bit(offset, &pc->enabled_irq_map[bank]);
+ raw_spin_unlock_irqrestore(&pc->irq_lock[bank], flags);
+}
+
+static void bcm2835_gpio_irq_release(struct irq_data *data)
+{
+ struct bcm2835_pinctrl *pc = irq_data_get_irq_chip_data(data);
+ unsigned gpio = irqd_to_hwirq(data);
+ unsigned offset = GPIO_REG_SHIFT(gpio);
+ unsigned bank = GPIO_REG_OFFSET(gpio);
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&pc->irq_lock[bank], flags);
+ set_bit(offset, &pc->enabled_irq_map[bank]);
+ bcm2835_gpio_irq_config(pc, gpio, true);
+ raw_spin_unlock_irqrestore(&pc->irq_lock[bank], flags);
+}
+
+#endif
+
static struct irq_chip bcm2835_gpio_irq_chip = {
.name = MODULE_NAME,
.irq_enable = bcm2835_gpio_irq_enable,
@@ -675,7 +715,11 @@ static struct irq_chip bcm2835_gpio_irq_chip = {
.irq_mask = bcm2835_gpio_irq_disable,
.irq_unmask = bcm2835_gpio_irq_enable,
.irq_set_wake = bcm2835_gpio_irq_set_wake,
- .flags = IRQCHIP_MASK_ON_SUSPEND,
+#ifdef CONFIG_IPIPE
+ .irq_hold = bcm2835_gpio_irq_hold,
+ .irq_release = bcm2835_gpio_irq_release,
+#endif
+ .flags = IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE,
};
static int bcm2835_pctl_get_groups_count(struct pinctrl_dev *pctldev)
diff --git a/drivers/pinctrl/intel/pinctrl-intel.c b/drivers/pinctrl/intel/pinctrl-intel.c
index 32c6326337f7..f5f3615237c0 100644
--- a/drivers/pinctrl/intel/pinctrl-intel.c
+++ b/drivers/pinctrl/intel/pinctrl-intel.c
@@ -1148,7 +1148,9 @@ static irqreturn_t intel_gpio_community_irq_handler(struct intel_pinctrl *pctrl,
irq = irq_find_mapping(gc->irq.domain,
padgrp->gpio_base + gpp_offset);
- generic_handle_irq(irq);
+ hard_cond_local_irq_disable();
+ ipipe_handle_demuxed_irq(irq);
+ hard_cond_local_irq_enable();
ret |= IRQ_HANDLED;
}
@@ -1236,7 +1238,7 @@ static int intel_gpio_probe(struct intel_pinctrl *pctrl, int irq)
pctrl->irqchip.irq_unmask = intel_gpio_irq_unmask;
pctrl->irqchip.irq_set_type = intel_gpio_irq_type;
pctrl->irqchip.irq_set_wake = intel_gpio_irq_wake;
- pctrl->irqchip.flags = IRQCHIP_MASK_ON_SUSPEND;
+ pctrl->irqchip.flags = IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE;
ret = devm_gpiochip_add_data(pctrl->dev, &pctrl->chip, pctrl);
if (ret) {
diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c
index 4b972be3487f..0ee5cfce21d8 100644
--- a/drivers/pinctrl/pinctrl-rockchip.c
+++ b/drivers/pinctrl/pinctrl-rockchip.c
@@ -2904,7 +2904,7 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
u32 polarity;
u32 level;
u32 data;
- unsigned long flags;
+ unsigned long flags, flags2;
int ret;
/* make sure the pin is configured as gpio input */
@@ -2927,7 +2927,7 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
irq_set_handler_locked(d, handle_level_irq);
raw_spin_lock_irqsave(&bank->slock, flags);
- irq_gc_lock(gc);
+ flags2 = irq_gc_lock(gc);
level = readl_relaxed(gc->reg_base + GPIO_INTTYPE_LEVEL);
polarity = readl_relaxed(gc->reg_base + GPIO_INT_POLARITY);
@@ -2968,7 +2968,7 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
polarity &= ~mask;
break;
default:
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags2);
raw_spin_unlock_irqrestore(&bank->slock, flags);
clk_disable(bank->clk);
return -EINVAL;
@@ -2977,7 +2977,7 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
writel_relaxed(level, gc->reg_base + GPIO_INTTYPE_LEVEL);
writel_relaxed(polarity, gc->reg_base + GPIO_INT_POLARITY);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags2);
raw_spin_unlock_irqrestore(&bank->slock, flags);
clk_disable(bank->clk);
diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index ce5be6f0b7aa..02917f6de8a8 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -16,6 +16,7 @@
#include <linux/err.h>
#include <linux/list.h>
#include <linux/interrupt.h>
+#include <linux/ipipe.h>
#include <linux/irqchip/chained_irq.h>
@@ -185,7 +186,11 @@ struct pcs_device {
#define PCS_FEAT_PINCONF (1 << 0)
struct property *missing_nr_pinctrl_cells;
struct pcs_soc_data socdata;
+#ifdef CONFIG_IPIPE
+ ipipe_spinlock_t lock;
+#else /* !IPIPE */
raw_spinlock_t lock;
+#endif /* !IPIPE */
struct mutex mutex;
unsigned width;
unsigned fmask;
@@ -1466,7 +1471,7 @@ static int pcs_irq_handle(struct pcs_soc_data *pcs_soc)
mask = pcs->read(pcswi->reg);
raw_spin_unlock(&pcs->lock);
if (mask & pcs_soc->irq_status_mask) {
- generic_handle_irq(irq_find_mapping(pcs->domain,
+ ipipe_handle_demuxed_irq(irq_find_mapping(pcs->domain,
pcswi->hwirq));
count++;
}
@@ -1486,8 +1491,14 @@ static int pcs_irq_handle(struct pcs_soc_data *pcs_soc)
static irqreturn_t pcs_irq_handler(int irq, void *d)
{
struct pcs_soc_data *pcs_soc = d;
+ unsigned long flags;
+ irqreturn_t ret;
- return pcs_irq_handle(pcs_soc) ? IRQ_HANDLED : IRQ_NONE;
+ flags = hard_cond_local_irq_save();
+ ret = pcs_irq_handle(pcs_soc) ? IRQ_HANDLED : IRQ_NONE;
+ hard_cond_local_irq_restore(flags);
+
+ return ret;
}
/**
diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
index 8c41f8b818b2..91347f19b1bd 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
@@ -15,6 +15,7 @@
#include <linux/gpio/driver.h>
#include <linux/irqdomain.h>
#include <linux/irqchip/chained_irq.h>
+#include <linux/ipipe.h>
#include <linux/export.h>
#include <linux/of.h>
#include <linux/of_clk.h>
@@ -1069,14 +1070,33 @@ static struct irq_chip sunxi_pinctrl_edge_irq_chip = {
.irq_request_resources = sunxi_pinctrl_irq_request_resources,
.irq_release_resources = sunxi_pinctrl_irq_release_resources,
.irq_set_type = sunxi_pinctrl_irq_set_type,
- .flags = IRQCHIP_SKIP_SET_WAKE,
+ .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
};
+#ifdef CONFIG_IPIPE
+
+static void sunxi_pinctrl_irq_hold(struct irq_data *d)
+{
+ sunxi_pinctrl_irq_mask(d);
+ sunxi_pinctrl_irq_ack(d);
+}
+
+static void sunxi_pinctrl_irq_release(struct irq_data *d)
+{
+ sunxi_pinctrl_irq_unmask(d);
+}
+
+#endif
+
static struct irq_chip sunxi_pinctrl_level_irq_chip = {
.name = "sunxi_pio_level",
.irq_eoi = sunxi_pinctrl_irq_ack,
.irq_mask = sunxi_pinctrl_irq_mask,
.irq_unmask = sunxi_pinctrl_irq_unmask,
+#ifdef CONFIG_IPIPE
+ .irq_hold = sunxi_pinctrl_irq_hold,
+ .irq_release = sunxi_pinctrl_irq_release,
+#endif
/* Define irq_enable / disable to avoid spurious irqs for drivers
* using these to suppress irqs while they clear the irq source */
.irq_enable = sunxi_pinctrl_irq_ack_unmask,
@@ -1085,7 +1105,7 @@ static struct irq_chip sunxi_pinctrl_level_irq_chip = {
.irq_release_resources = sunxi_pinctrl_irq_release_resources,
.irq_set_type = sunxi_pinctrl_irq_set_type,
.flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_EOI_THREADED |
- IRQCHIP_EOI_IF_HANDLED,
+ IRQCHIP_EOI_IF_HANDLED | IRQCHIP_PIPELINE_SAFE,
};
static int sunxi_pinctrl_irq_of_xlate(struct irq_domain *d,
@@ -1144,7 +1164,7 @@ static void sunxi_pinctrl_irq_handler(struct irq_desc *desc)
for_each_set_bit(irqoffset, &val, IRQ_PER_BANK) {
int pin_irq = irq_find_mapping(pctl->domain,
bank * IRQ_PER_BANK + irqoffset);
- generic_handle_irq(pin_irq);
+ ipipe_handle_demuxed_irq(pin_irq);
}
}
diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.h b/drivers/pinctrl/sunxi/pinctrl-sunxi.h
index a32bb5bcb754..f1139039d2b0 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.h
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.h
@@ -167,7 +167,11 @@ struct sunxi_pinctrl {
unsigned ngroups;
int *irq;
unsigned *irq_array;
+#ifdef CONFIG_IPIPE
+ ipipe_spinlock_t lock;
+#else
raw_spinlock_t lock;
+#endif
struct pinctrl_dev *pctl_dev;
unsigned long variant;
};
diff --git a/drivers/soc/dove/pmu.c b/drivers/soc/dove/pmu.c
index ffc5311c0ed8..ad379f0f65e8 100644
--- a/drivers/soc/dove/pmu.c
+++ b/drivers/soc/dove/pmu.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/soc/dove/pmu.h>
#include <linux/spinlock.h>
+#include <linux/ipipe.h>
#define NR_PMU_IRQS 7
@@ -231,6 +232,7 @@ static void pmu_irq_handler(struct irq_desc *desc)
void __iomem *base = gc->reg_base;
u32 stat = readl_relaxed(base + PMC_IRQ_CAUSE) & gc->mask_cache;
u32 done = ~0;
+ unsigned long flags;
if (stat == 0) {
handle_bad_irq(desc);
@@ -243,7 +245,7 @@ static void pmu_irq_handler(struct irq_desc *desc)
stat &= ~(1 << hwirq);
done &= ~(1 << hwirq);
- generic_handle_irq(irq_find_mapping(domain, hwirq));
+ ipipe_handle_demuxed_irq(irq_find_mapping(domain, hwirq));
}
/*
@@ -257,10 +259,10 @@ static void pmu_irq_handler(struct irq_desc *desc)
* So, let's structure the code so that the window is as small as
* possible.
*/
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
done &= readl_relaxed(base + PMC_IRQ_CAUSE);
writel_relaxed(done, base + PMC_IRQ_CAUSE);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
static int __init dove_init_pmu_irq(struct pmu_data *pmu, int irq)
@@ -296,6 +298,7 @@ static int __init dove_init_pmu_irq(struct pmu_data *pmu, int irq)
gc->chip_types[0].regs.mask = PMC_IRQ_MASK;
gc->chip_types[0].chip.irq_mask = irq_gc_mask_clr_bit;
gc->chip_types[0].chip.irq_unmask = irq_gc_mask_set_bit;
+ gc->chip_types[0].chip.flags |= IRQCHIP_PIPELINE_SAFE;
pmu->irq_domain = domain;
pmu->irq_gc = gc;
diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 2675771a03a0..cf293151b4e6 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -586,6 +586,48 @@ static void univ8250_console_write(struct console *co, const char *s,
serial8250_console_write(up, s, count);
}
+#ifdef CONFIG_RAW_PRINTK
+
+static void raw_write_char(struct uart_8250_port *up, int c)
+{
+ unsigned int status, tmout = 10000;
+
+ for (;;) {
+ status = serial_in(up, UART_LSR);
+ up->lsr_saved_flags |= status & LSR_SAVE_FLAGS;
+ if ((status & UART_LSR_THRE) == UART_LSR_THRE)
+ break;
+ if (--tmout == 0)
+ break;
+ cpu_relax();
+ }
+ serial_port_out(&up->port, UART_TX, c);
+}
+
+static void univ8250_console_write_raw(struct console *co, const char *s,
+ unsigned int count)
+{
+ struct uart_8250_port *up = &serial8250_ports[co->index];
+ unsigned int ier;
+
+ ier = serial_in(up, UART_IER);
+
+ if (up->capabilities & UART_CAP_UUE)
+ serial_out(up, UART_IER, UART_IER_UUE);
+ else
+ serial_out(up, UART_IER, 0);
+
+ while (count-- > 0) {
+ if (*s == '\n')
+ raw_write_char(up, '\r');
+ raw_write_char(up, *s++);
+ }
+
+ serial_out(up, UART_IER, ier);
+}
+
+#endif
+
static int univ8250_console_setup(struct console *co, char *options)
{
struct uart_port *port;
@@ -667,7 +709,12 @@ static struct console univ8250_console = {
.device = uart_console_device,
.setup = univ8250_console_setup,
.match = univ8250_console_match,
+#ifdef CONFIG_RAW_PRINTK
+ .write_raw = univ8250_console_write_raw,
+ .flags = CON_PRINTBUFFER | CON_ANYTIME | CON_RAW,
+#else
.flags = CON_PRINTBUFFER | CON_ANYTIME,
+#endif
.index = -1,
.data = &serial8250_reg,
};
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 86084090232d..caf15a9fd694 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -2217,6 +2217,42 @@ static void pl011_console_putchar(struct uart_port *port, int ch)
pl011_write(ch, uap, REG_DR);
}
+#ifdef CONFIG_RAW_PRINTK
+
+#define pl011_clk_setup(clk) clk_prepare_enable(clk)
+#define pl011_clk_enable(clk) do { } while (0)
+#define pl011_clk_disable(clk) do { } while (0)
+
+static void
+pl011_console_write_raw(struct console *co, const char *s, unsigned int count)
+{
+ struct uart_amba_port *uap = amba_ports[co->index];
+ unsigned int old_cr, new_cr, status;
+
+ old_cr = readw(uap->port.membase + UART011_CR);
+ new_cr = old_cr & ~UART011_CR_CTSEN;
+ new_cr |= UART01x_CR_UARTEN | UART011_CR_TXE;
+ writew(new_cr, uap->port.membase + UART011_CR);
+
+ while (count-- > 0) {
+ if (*s == '\n')
+ pl011_console_putchar(&uap->port, '\r');
+ pl011_console_putchar(&uap->port, *s++);
+ }
+ do
+ status = readw(uap->port.membase + UART01x_FR);
+ while (status & UART01x_FR_BUSY);
+ writew(old_cr, uap->port.membase + UART011_CR);
+}
+
+#else /* !CONFIG_RAW_PRINTK */
+
+#define pl011_clk_setup(clk) clk_prepare(clk)
+#define pl011_clk_enable(clk) clk_enable(clk)
+#define pl011_clk_disable(clk) clk_disable(clk)
+
+#endif /* !CONFIG_RAW_PRINTK */
+
static void
pl011_console_write(struct console *co, const char *s, unsigned int count)
{
@@ -2225,7 +2261,7 @@ pl011_console_write(struct console *co, const char *s, unsigned int count)
unsigned long flags;
int locked = 1;
- clk_enable(uap->clk);
+ pl011_clk_enable(uap->clk);
local_irq_save(flags);
if (uap->port.sysrq)
@@ -2262,7 +2298,7 @@ pl011_console_write(struct console *co, const char *s, unsigned int count)
spin_unlock(&uap->port.lock);
local_irq_restore(flags);
- clk_disable(uap->clk);
+ pl011_clk_disable(uap->clk);
}
static void pl011_console_get_options(struct uart_amba_port *uap, int *baud,
@@ -2322,7 +2358,7 @@ static int pl011_console_setup(struct console *co, char *options)
/* Allow pins to be muxed in and configured */
pinctrl_pm_select_default_state(uap->port.dev);
- ret = clk_prepare(uap->clk);
+ ret = pl011_clk_setup(uap->clk);
if (ret)
return ret;
@@ -2416,7 +2452,12 @@ static struct console amba_console = {
.device = uart_console_device,
.setup = pl011_console_setup,
.match = pl011_console_match,
+#ifdef CONFIG_RAW_PRINTK
+ .write_raw = pl011_console_write_raw,
+ .flags = CON_PRINTBUFFER | CON_RAW | CON_ANYTIME,
+#else
.flags = CON_PRINTBUFFER | CON_ANYTIME,
+#endif
.index = -1,
.data = &amba_reg,
};
diff --git a/drivers/tty/serial/xilinx_uartps.c b/drivers/tty/serial/xilinx_uartps.c
index 6842999072c5..485f1d745e6c 100644
--- a/drivers/tty/serial/xilinx_uartps.c
+++ b/drivers/tty/serial/xilinx_uartps.c
@@ -1236,6 +1236,34 @@ static void cdns_uart_console_write(struct console *co, const char *s,
spin_unlock_irqrestore(&port->lock, flags);
}
+#ifdef CONFIG_RAW_PRINTK
+
+static void cdns_uart_console_write_raw(struct console *co, const char *s,
+ unsigned int count)
+{
+ struct uart_port *port = &cdns_uart_port[co->index];
+ unsigned int imr, ctrl;
+
+ imr = readl(port->membase + CDNS_UART_IMR);
+ writel(imr, port->membase + CDNS_UART_IDR);
+
+ ctrl = readl(port->membase + CDNS_UART_CR);
+ ctrl &= ~CDNS_UART_CR_TX_DIS;
+ ctrl |= CDNS_UART_CR_TX_EN;
+ writel(ctrl, port->membase + CDNS_UART_CR);
+
+ while (count-- > 0) {
+ if (*s == '\n')
+ writel('\r', port->membase + CDNS_UART_FIFO);
+ writel(*s++, port->membase + CDNS_UART_FIFO);
+ }
+
+ writel(ctrl, port->membase + CDNS_UART_CR);
+ writel(imr, port->membase + CDNS_UART_IER);
+}
+
+#endif
+
/**
* cdns_uart_console_setup - Initialize the uart to default config
* @co: Console handle
@@ -1277,7 +1305,12 @@ static struct console cdns_uart_console = {
.write = cdns_uart_console_write,
.device = uart_console_device,
.setup = cdns_uart_console_setup,
+#ifdef CONFIG_RAW_PRINTK
+ .write_raw = cdns_uart_console_write_raw,
+ .flags = CON_PRINTBUFFER | CON_RAW,
+#else
.flags = CON_PRINTBUFFER,
+#endif
.index = -1, /* Specified on the cmdline (e.g. console=ttyPS ) */
.data = &cdns_uart_uart_driver,
};
diff --git a/fs/exec.c b/fs/exec.c
index a7d78241082a..914804c49d95 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -50,6 +50,7 @@
#include <linux/module.h>
#include <linux/namei.h>
#include <linux/mount.h>
+#include <linux/ipipe.h>
#include <linux/security.h>
#include <linux/syscalls.h>
#include <linux/tsacct_kern.h>
@@ -1025,6 +1026,7 @@ static int exec_mmap(struct mm_struct *mm)
{
struct task_struct *tsk;
struct mm_struct *old_mm, *active_mm;
+ unsigned long flags;
int ret;
/* Notify parent that we're no longer interested in the old VM */
@@ -1056,6 +1058,7 @@ static int exec_mmap(struct mm_struct *mm)
membarrier_exec_mmap(mm);
local_irq_disable();
+ ipipe_mm_switch_protect(flags);
active_mm = tsk->active_mm;
tsk->active_mm = mm;
tsk->mm = mm;
@@ -1066,10 +1069,15 @@ static int exec_mmap(struct mm_struct *mm)
* switches. Not all architectures can handle irqs off over
* activate_mm yet.
*/
- if (!IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM))
+ if (!IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM) &&
+ (!IS_ENABLED(CONFIG_IPIPE) ||
+ IS_ENABLED(CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH)))
local_irq_enable();
activate_mm(active_mm, mm);
- if (IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM))
+ ipipe_mm_switch_unprotect(flags);
+ if (IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM) ||
+ (IS_ENABLED(CONFIG_IPIPE) &&
+ !IS_ENABLED(CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH)))
local_irq_enable();
tsk->mm->vmacache_seqnum = 0;
vmacache_flush(tsk);
diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h
index 286867f593d2..2e7a4dd9790a 100644
--- a/include/asm-generic/atomic.h
+++ b/include/asm-generic/atomic.h
@@ -76,9 +76,9 @@ static inline void atomic_##op(int i, atomic_t *v) \
{ \
unsigned long flags; \
\
- raw_local_irq_save(flags); \
+ flags = hard_local_irq_save(); \
v->counter = v->counter c_op i; \
- raw_local_irq_restore(flags); \
+ hard_local_irq_restore(flags); \
}
#define ATOMIC_OP_RETURN(op, c_op) \
@@ -87,9 +87,9 @@ static inline int atomic_##op##_return(int i, atomic_t *v) \
unsigned long flags; \
int ret; \
\
- raw_local_irq_save(flags); \
+ flags = hard_local_irq_save(); \
ret = (v->counter = v->counter c_op i); \
- raw_local_irq_restore(flags); \
+ hard_local_irq_restore(flags); \
\
return ret; \
}
@@ -100,10 +100,10 @@ static inline int atomic_fetch_##op(int i, atomic_t *v) \
unsigned long flags; \
int ret; \
\
- raw_local_irq_save(flags); \
+ flags = hard_local_irq_save(flags); \
ret = v->counter; \
v->counter = v->counter c_op i; \
- raw_local_irq_restore(flags); \
+ hard_local_irq_restore(flags); \
\
return ret; \
}
diff --git a/include/asm-generic/cmpxchg-local.h b/include/asm-generic/cmpxchg-local.h
index f17f14f84d09..e05f37fb7158 100644
--- a/include/asm-generic/cmpxchg-local.h
+++ b/include/asm-generic/cmpxchg-local.h
@@ -4,6 +4,7 @@
#include <linux/types.h>
#include <linux/irqflags.h>
+#include <asm-generic/ipipe.h>
extern unsigned long wrong_size_cmpxchg(volatile void *ptr)
__noreturn;
@@ -23,7 +24,7 @@ static inline unsigned long __cmpxchg_local_generic(volatile void *ptr,
if (size == 8 && sizeof(unsigned long) != 8)
wrong_size_cmpxchg(ptr);
- raw_local_irq_save(flags);
+ flags = hard_local_irq_save();
switch (size) {
case 1: prev = *(u8 *)ptr;
if (prev == old)
@@ -44,7 +45,7 @@ static inline unsigned long __cmpxchg_local_generic(volatile void *ptr,
default:
wrong_size_cmpxchg(ptr);
}
- raw_local_irq_restore(flags);
+ hard_local_irq_restore(flags);
return prev;
}
@@ -57,11 +58,11 @@ static inline u64 __cmpxchg64_local_generic(volatile void *ptr,
u64 prev;
unsigned long flags;
- raw_local_irq_save(flags);
+ flags = hard_local_irq_save();
prev = *(u64 *)ptr;
if (prev == old)
*(u64 *)ptr = new;
- raw_local_irq_restore(flags);
+ hard_local_irq_restore(flags);
return prev;
}
diff --git a/include/asm-generic/ipipe.h b/include/asm-generic/ipipe.h
new file mode 100644
index 000000000000..102ffffe4a54
--- /dev/null
+++ b/include/asm-generic/ipipe.h
@@ -0,0 +1,93 @@
+/* -*- linux-c -*-
+ * include/asm-generic/ipipe.h
+ *
+ * Copyright (C) 2002-2017 Philippe Gerum.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ */
+#ifndef __ASM_GENERIC_IPIPE_H
+#define __ASM_GENERIC_IPIPE_H
+
+#ifdef CONFIG_IPIPE
+
+#if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || defined(CONFIG_PROVE_LOCKING) || \
+ defined(CONFIG_PREEMPT_VOLUNTARY) || defined(CONFIG_IPIPE_DEBUG_CONTEXT)
+void __ipipe_uaccess_might_fault(void);
+#else
+#define __ipipe_uaccess_might_fault() might_fault()
+#endif
+
+#define hard_cond_local_irq_enable() hard_local_irq_enable()
+#define hard_cond_local_irq_disable() hard_local_irq_disable()
+#define hard_cond_local_irq_save() hard_local_irq_save()
+#define hard_cond_local_irq_restore(flags) hard_local_irq_restore(flags)
+
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+void ipipe_root_only(void);
+#else /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+static inline void ipipe_root_only(void) { }
+#endif /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+
+void ipipe_stall_root(void);
+
+void ipipe_unstall_root(void);
+
+unsigned long ipipe_test_and_stall_root(void);
+
+unsigned long ipipe_test_root(void);
+
+void ipipe_restore_root(unsigned long x);
+
+#else /* !CONFIG_IPIPE */
+
+#define hard_local_irq_save_notrace() \
+ ({ \
+ unsigned long __flags; \
+ raw_local_irq_save(__flags); \
+ __flags; \
+ })
+
+#define hard_local_irq_restore_notrace(__flags) \
+ raw_local_irq_restore(__flags)
+
+#define hard_local_irq_enable_notrace() \
+ raw_local_irq_enable()
+
+#define hard_local_irq_disable_notrace() \
+ raw_local_irq_disable()
+
+#define hard_local_irq_save() \
+ ({ \
+ unsigned long __flags; \
+ local_irq_save(__flags); \
+ __flags; \
+ })
+#define hard_local_irq_restore(__flags) local_irq_restore(__flags)
+#define hard_local_irq_enable() local_irq_enable()
+#define hard_local_irq_disable() local_irq_disable()
+#define hard_irqs_disabled() irqs_disabled()
+
+#define hard_cond_local_irq_enable() do { } while(0)
+#define hard_cond_local_irq_disable() do { } while(0)
+#define hard_cond_local_irq_save() 0
+#define hard_cond_local_irq_restore(__flags) do { (void)(__flags); } while(0)
+
+#define __ipipe_uaccess_might_fault() might_fault()
+
+static inline void ipipe_root_only(void) { }
+
+#endif /* !CONFIG_IPIPE */
+
+#if defined(CONFIG_SMP) && defined(CONFIG_IPIPE)
+#define hard_smp_local_irq_save() hard_local_irq_save()
+#define hard_smp_local_irq_restore(__flags) hard_local_irq_restore(__flags)
+#else /* !CONFIG_SMP */
+#define hard_smp_local_irq_save() 0
+#define hard_smp_local_irq_restore(__flags) do { (void)(__flags); } while(0)
+#endif /* CONFIG_SMP */
+
+#endif
diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h
index c2de013b2cf4..109a4bcd741a 100644
--- a/include/asm-generic/percpu.h
+++ b/include/asm-generic/percpu.h
@@ -5,6 +5,7 @@
#include <linux/compiler.h>
#include <linux/threads.h>
#include <linux/percpu-defs.h>
+#include <asm-generic/ipipe.h>
#ifdef CONFIG_SMP
@@ -44,11 +45,29 @@ extern unsigned long __per_cpu_offset[NR_CPUS];
#define arch_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
#endif
+#ifdef CONFIG_IPIPE
+#if defined(CONFIG_IPIPE_DEBUG_INTERNAL) && defined(CONFIG_SMP)
+unsigned long __ipipe_cpu_get_offset(void);
+#define __ipipe_cpu_offset __ipipe_cpu_get_offset()
+#else
+#define __ipipe_cpu_offset __my_cpu_offset
+#endif
+#ifndef __ipipe_raw_cpu_ptr
+#define __ipipe_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __ipipe_cpu_offset)
+#endif
+#define __ipipe_raw_cpu_read(var) (*__ipipe_raw_cpu_ptr(&(var)))
+#endif /* CONFIG_IPIPE */
+
#ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
extern void setup_per_cpu_areas(void);
#endif
-#endif /* SMP */
+#else /* !SMP */
+
+#define __ipipe_raw_cpu_ptr(ptr) VERIFY_PERCPU_PTR(ptr)
+#define __ipipe_raw_cpu_read(var) (*__ipipe_raw_cpu_ptr(&(var)))
+
+#endif /* !SMP */
#ifndef PER_CPU_BASE_SECTION
#ifdef CONFIG_SMP
@@ -144,9 +163,9 @@ do { \
#define this_cpu_generic_to_op(pcp, val, op) \
do { \
unsigned long __flags; \
- raw_local_irq_save(__flags); \
+ __flags = hard_local_irq_save(); \
raw_cpu_generic_to_op(pcp, val, op); \
- raw_local_irq_restore(__flags); \
+ hard_local_irq_restore(__flags); \
} while (0)
@@ -154,9 +173,9 @@ do { \
({ \
typeof(pcp) __ret; \
unsigned long __flags; \
- raw_local_irq_save(__flags); \
+ __flags = hard_local_irq_save(); \
__ret = raw_cpu_generic_add_return(pcp, val); \
- raw_local_irq_restore(__flags); \
+ hard_local_irq_restore(__flags); \
__ret; \
})
@@ -164,9 +183,9 @@ do { \
({ \
typeof(pcp) __ret; \
unsigned long __flags; \
- raw_local_irq_save(__flags); \
+ __flags = hard_local_irq_save(); \
__ret = raw_cpu_generic_xchg(pcp, nval); \
- raw_local_irq_restore(__flags); \
+ hard_local_irq_restore(__flags); \
__ret; \
})
@@ -174,9 +193,9 @@ do { \
({ \
typeof(pcp) __ret; \
unsigned long __flags; \
- raw_local_irq_save(__flags); \
+ __flags = hard_local_irq_save(); \
__ret = raw_cpu_generic_cmpxchg(pcp, oval, nval); \
- raw_local_irq_restore(__flags); \
+ hard_local_irq_restore(__flags); \
__ret; \
})
@@ -184,10 +203,10 @@ do { \
({ \
int __ret; \
unsigned long __flags; \
- raw_local_irq_save(__flags); \
+ __flags = hard_local_irq_save(); \
__ret = raw_cpu_generic_cmpxchg_double(pcp1, pcp2, \
oval1, oval2, nval1, nval2); \
- raw_local_irq_restore(__flags); \
+ hard_local_irq_restore(__flags); \
__ret; \
})
diff --git a/include/asm-generic/switch_to.h b/include/asm-generic/switch_to.h
index 5897d100a6e6..600fcb9f6cd9 100644
--- a/include/asm-generic/switch_to.h
+++ b/include/asm-generic/switch_to.h
@@ -17,10 +17,17 @@
*/
extern struct task_struct *__switch_to(struct task_struct *,
struct task_struct *);
-
+#ifdef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
#define switch_to(prev, next, last) \
do { \
+ hard_cond_local_irq_disable(); \
((last) = __switch_to((prev), (next))); \
+ hard_cond_local_irq_enable(); \
} while (0)
-
+#else /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+#define switch_to(prev, next, last) \
+ do { \
+ ((last) = __switch_to((prev), (next))); \
+ } while (0)
+#endif /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
#endif /* __ASM_GENERIC_SWITCH_TO_H */
diff --git a/include/clocksource/timer-sp804.h b/include/clocksource/timer-sp804.h
index a5b41f31a1c2..461da5cc7cdf 100644
--- a/include/clocksource/timer-sp804.h
+++ b/include/clocksource/timer-sp804.h
@@ -5,20 +5,23 @@
struct clk;
int __sp804_clocksource_and_sched_clock_init(void __iomem *,
+ unsigned long phys,
const char *, struct clk *, int);
int __sp804_clockevents_init(void __iomem *, unsigned int,
struct clk *, const char *);
void sp804_timer_disable(void __iomem *);
-static inline void sp804_clocksource_init(void __iomem *base, const char *name)
+static inline void sp804_clocksource_init(void __iomem *base, unsigned long phys,
+ const char *name)
{
- __sp804_clocksource_and_sched_clock_init(base, name, NULL, 0);
+ __sp804_clocksource_and_sched_clock_init(base, phys, name, NULL, 0);
}
static inline void sp804_clocksource_and_sched_clock_init(void __iomem *base,
+ unsigned long phys,
const char *name)
{
- __sp804_clocksource_and_sched_clock_init(base, name, NULL, 1);
+ __sp804_clocksource_and_sched_clock_init(base, phys, name, NULL, 1);
}
static inline void sp804_clockevents_init(void __iomem *base, unsigned int irq, const char *name)
diff --git a/include/ipipe/setup.h b/include/ipipe/setup.h
new file mode 100644
index 000000000000..c2bc5218cf65
--- /dev/null
+++ b/include/ipipe/setup.h
@@ -0,0 +1,10 @@
+#ifndef _IPIPE_SETUP_H
+#define _IPIPE_SETUP_H
+
+/*
+ * Placeholders for setup hooks defined by client domains.
+ */
+
+static inline void __ipipe_early_client_setup(void) { }
+
+#endif /* !_IPIPE_SETUP_H */
diff --git a/include/ipipe/thread_info.h b/include/ipipe/thread_info.h
new file mode 100644
index 000000000000..7038c12942c8
--- /dev/null
+++ b/include/ipipe/thread_info.h
@@ -0,0 +1,14 @@
+#ifndef _IPIPE_THREAD_INFO_H
+#define _IPIPE_THREAD_INFO_H
+
+/*
+ * Placeholder for private thread information defined by client
+ * domains.
+ */
+
+struct ipipe_threadinfo { void *ptd[4];
+};
+
+static inline void __ipipe_init_threadinfo(struct ipipe_threadinfo *p) { p->ptd[0] = p->ptd[1] = p->ptd[2] = p->ptd[3] = 0; }
+
+#endif /* !_IPIPE_THREAD_INFO_H */
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 8ae9a95ebf5b..ab59237a85d7 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -129,6 +129,15 @@ struct clock_event_device {
const struct cpumask *cpumask;
struct list_head list;
struct module *owner;
+
+#ifdef CONFIG_IPIPE
+ struct ipipe_timer *ipipe_timer;
+ unsigned ipipe_stolen;
+
+#define clockevent_ipipe_stolen(evt) ((evt)->ipipe_stolen)
+#else
+#define clockevent_ipipe_stolen(evt) (0)
+#endif /* !CONFIG_IPIPE */
} ____cacheline_aligned;
/* Helpers to verify state of a clockevent device */
diff --git a/include/linux/console.h b/include/linux/console.h
index d1d03c9c7a51..7964576a8dea 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -141,10 +141,12 @@ static inline int con_debug_leave(void)
#define CON_ANYTIME (16) /* Safe to call when cpu is offline */
#define CON_BRL (32) /* Used for a braille device */
#define CON_EXTENDED (64) /* Use the extended output format a la /dev/kmsg */
+#define CON_RAW (128) /* Supports raw write mode */
struct console {
char name[16];
void (*write)(struct console *, const char *, unsigned);
+ void (*write_raw)(struct console *, const char *, unsigned);
int (*read)(struct console *, char *, unsigned);
struct tty_driver *(*device)(struct console *, int *);
void (*unblank)(void);
diff --git a/include/linux/dw_apb_timer.h b/include/linux/dw_apb_timer.h
index 14f072edbca5..66575506377b 100644
--- a/include/linux/dw_apb_timer.h
+++ b/include/linux/dw_apb_timer.h
@@ -32,6 +32,7 @@ struct dw_apb_clock_event_device {
struct dw_apb_clocksource {
struct dw_apb_timer timer;
struct clocksource cs;
+ unsigned long phys;
};
void dw_apb_clockevent_register(struct dw_apb_clock_event_device *dw_ced);
@@ -44,7 +45,7 @@ dw_apb_clockevent_init(int cpu, const char *name, unsigned rating,
void __iomem *base, int irq, unsigned long freq);
struct dw_apb_clocksource *
dw_apb_clocksource_init(unsigned rating, const char *name, void __iomem *base,
- unsigned long freq);
+ unsigned long phys, unsigned long freq);
void dw_apb_clocksource_register(struct dw_apb_clocksource *dw_cs);
void dw_apb_clocksource_start(struct dw_apb_clocksource *dw_cs);
u64 dw_apb_clocksource_read(struct dw_apb_clocksource *dw_cs);
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9141f2263286..dbe1ac371a1d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -160,6 +160,7 @@ enum {
FTRACE_OPS_FL_PID = 1 << 13,
FTRACE_OPS_FL_RCU = 1 << 14,
FTRACE_OPS_FL_TRACE_ARRAY = 1 << 15,
+ FTRACE_OPS_FL_IPIPE_EXCLUSIVE = 1 << 17,
};
#ifdef CONFIG_DYNAMIC_FTRACE
diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
index 5dd9c982e2cb..c3699ef94760 100644
--- a/include/linux/gpio/driver.h
+++ b/include/linux/gpio/driver.h
@@ -392,7 +392,7 @@ struct gpio_chip {
void __iomem *reg_dir_in;
bool bgpio_dir_unreadable;
int bgpio_bits;
- spinlock_t bgpio_lock;
+ ipipe_spinlock_t bgpio_lock;
unsigned long bgpio_data;
unsigned long bgpio_dir;
#endif /* CONFIG_GPIO_GENERIC */
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index da0af631ded5..1b8f0fd221a1 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -6,6 +6,7 @@
#include <linux/lockdep.h>
#include <linux/ftrace_irq.h>
#include <linux/vtime.h>
+#include <linux/ipipe.h>
#include <asm/hardirq.h>
@@ -67,6 +68,7 @@ extern void irq_exit(void);
#define nmi_enter() \
do { \
+ __ipipe_nmi_enter(); \
arch_nmi_enter(); \
printk_nmi_enter(); \
lockdep_off(); \
@@ -87,6 +89,7 @@ extern void irq_exit(void);
lockdep_on(); \
printk_nmi_exit(); \
arch_nmi_exit(); \
+ __ipipe_nmi_exit(); \
} while (0)
#endif /* LINUX_HARDIRQ_H */
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 30e92536c78c..424d8daa68d3 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -504,6 +504,23 @@ extern bool force_irqthreads;
#define hard_irq_disable() do { } while(0)
#endif
+/*
+ * Unlike other virtualized interrupt disabling schemes may assume, we
+ * can't expect local_irq_restore() to turn hard interrupts on when
+ * pipelining. hard_irq_enable() is introduced to be paired with
+ * hard_irq_disable(), for unconditionally turning them on. The only
+ * sane sequence mixing virtual and real disable state manipulation
+ * is:
+ *
+ * 1. local_irq_save/disable
+ * 2. hard_irq_disable
+ * 3. hard_irq_enable
+ * 4. local_irq_restore/enable
+ */
+#ifndef hard_irq_enable
+#define hard_irq_enable() hard_cond_local_irq_enable()
+#endif
+
/* PLEASE, avoid to allocate new softirqs, if you need not _really_ high
frequency threaded job scheduling. For almost all the purposes
tasklets are more than enough. F.e. all serial device BHs et
diff --git a/include/linux/ipipe.h b/include/linux/ipipe.h
new file mode 100644
index 000000000000..fe90cb55d462
--- /dev/null
+++ b/include/linux/ipipe.h
@@ -0,0 +1,721 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe.h
+ *
+ * Copyright (C) 2002-2014 Philippe Gerum.
+ * 2007 Jan Kiszka.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_H
+#define __LINUX_IPIPE_H
+
+#include <linux/spinlock.h>
+#include <linux/cache.h>
+#include <linux/percpu.h>
+#include <linux/irq.h>
+#include <linux/thread_info.h>
+#include <linux/ipipe_debug.h>
+#include <asm/ptrace.h>
+#ifdef CONFIG_HAVE_IPIPE_SUPPORT
+#include <asm/ipipe.h>
+#endif
+
+struct cpuidle_device;
+struct cpuidle_state;
+struct kvm_vcpu;
+struct ipipe_vm_notifier;
+struct irq_desc;
+struct task_struct;
+struct mm_struct;
+
+#ifdef CONFIG_IPIPE
+
+#include <linux/ipipe_domain.h>
+
+#define IPIPE_CORE_APIREV CONFIG_IPIPE_CORE_APIREV
+
+#include <linux/ipipe_domain.h>
+#include <linux/compiler.h>
+#include <linux/linkage.h>
+#include <asm/ipipe_base.h>
+
+struct pt_regs;
+struct ipipe_domain;
+
+struct ipipe_vm_notifier {
+ void (*handler)(struct ipipe_vm_notifier *nfy);
+};
+
+static inline int ipipe_virtual_irq_p(unsigned int irq)
+{
+ return irq >= IPIPE_VIRQ_BASE && irq < IPIPE_NR_IRQS;
+}
+
+void __ipipe_init_early(void);
+
+void __ipipe_init(void);
+
+#ifdef CONFIG_PROC_FS
+void __ipipe_init_proc(void);
+#ifdef CONFIG_IPIPE_TRACE
+void __ipipe_init_tracer(void);
+#else /* !CONFIG_IPIPE_TRACE */
+static inline void __ipipe_init_tracer(void) { }
+#endif /* CONFIG_IPIPE_TRACE */
+#else /* !CONFIG_PROC_FS */
+static inline void __ipipe_init_proc(void) { }
+#endif /* CONFIG_PROC_FS */
+
+void __ipipe_restore_root_nosync(unsigned long x);
+
+#define IPIPE_IRQF_NOACK 0x1
+#define IPIPE_IRQF_NOSYNC 0x2
+
+void __ipipe_dispatch_irq(unsigned int irq, int flags);
+
+void __ipipe_do_sync_stage(void);
+
+void __ipipe_do_sync_pipeline(struct ipipe_domain *top);
+
+void __ipipe_lock_irq(unsigned int irq);
+
+void __ipipe_unlock_irq(unsigned int irq);
+
+void __ipipe_do_critical_sync(unsigned int irq, void *cookie);
+
+void __ipipe_ack_edge_irq(struct irq_desc *desc);
+
+void __ipipe_nop_irq(struct irq_desc *desc);
+
+static inline void __ipipe_idle(void)
+{
+ ipipe_unstall_root();
+}
+
+#ifndef __ipipe_sync_check
+#define __ipipe_sync_check 1
+#endif
+
+static inline void __ipipe_sync_stage(void)
+{
+ if (likely(__ipipe_sync_check))
+ __ipipe_do_sync_stage();
+}
+
+#ifndef __ipipe_run_irqtail
+#define __ipipe_run_irqtail(irq) do { } while(0)
+#endif
+
+int __ipipe_log_printk(const char *fmt, va_list args);
+void __ipipe_flush_printk(unsigned int irq, void *cookie);
+
+#define __ipipe_get_cpu(flags) ({ (flags) = hard_preempt_disable(); ipipe_processor_id(); })
+#define __ipipe_put_cpu(flags) hard_preempt_enable(flags)
+
+int __ipipe_notify_kevent(int event, void *data);
+
+#define __ipipe_report_sigwake(p) \
+ do { \
+ if (ipipe_notifier_enabled_p(p)) \
+ __ipipe_notify_kevent(IPIPE_KEVT_SIGWAKE, p); \
+ } while (0)
+
+struct ipipe_cpu_migration_data {
+ struct task_struct *task;
+ int dest_cpu;
+};
+
+#define __ipipe_report_setaffinity(__p, __dest_cpu) \
+ do { \
+ struct ipipe_cpu_migration_data d = { \
+ .task = (__p), \
+ .dest_cpu = (__dest_cpu), \
+ }; \
+ if (ipipe_notifier_enabled_p(__p)) \
+ __ipipe_notify_kevent(IPIPE_KEVT_SETAFFINITY, &d); \
+ } while (0)
+
+#define __ipipe_report_exit(p) \
+ do { \
+ if (ipipe_notifier_enabled_p(p)) \
+ __ipipe_notify_kevent(IPIPE_KEVT_EXIT, p); \
+ } while (0)
+
+#define __ipipe_report_setsched(p) \
+ do { \
+ if (ipipe_notifier_enabled_p(p)) \
+ __ipipe_notify_kevent(IPIPE_KEVT_SETSCHED, p); \
+ } while (0)
+
+#define __ipipe_report_schedule(prev, next) \
+do { \
+ if (ipipe_notifier_enabled_p(next) || \
+ ipipe_notifier_enabled_p(prev)) { \
+ __this_cpu_write(ipipe_percpu.rqlock_owner, prev); \
+ __ipipe_notify_kevent(IPIPE_KEVT_SCHEDULE, next); \
+ } \
+} while (0)
+
+#define __ipipe_report_cleanup(mm) \
+ __ipipe_notify_kevent(IPIPE_KEVT_CLEANUP, mm)
+
+#define __ipipe_report_clockfreq_update(freq) \
+ __ipipe_notify_kevent(IPIPE_KEVT_CLOCKFREQ, &(freq))
+
+struct ipipe_ptrace_resume_data {
+ struct task_struct *task;
+ long request;
+};
+
+#define __ipipe_report_ptrace_resume(__p, __request) \
+ do { \
+ struct ipipe_ptrace_resume_data d = { \
+ .task = (__p), \
+ .request = (__request), \
+ }; \
+ if (ipipe_notifier_enabled_p(__p)) \
+ __ipipe_notify_kevent(IPIPE_KEVT_PTRESUME, &d); \
+ } while (0)
+
+int __ipipe_notify_syscall(struct pt_regs *regs);
+
+int __ipipe_notify_trap(int exception, struct pt_regs *regs);
+
+#define __ipipe_report_trap(exception, regs) \
+ __ipipe_notify_trap(exception, regs)
+
+void __ipipe_call_mayday(struct pt_regs *regs);
+
+int __ipipe_notify_user_intreturn(void);
+
+#define __ipipe_serial_debug(__fmt, __args...) raw_printk(__fmt, ##__args)
+
+struct ipipe_trap_data {
+ int exception;
+ struct pt_regs *regs;
+};
+
+/* ipipe_set_hooks(..., enables) */
+#define IPIPE_SYSCALL __IPIPE_SYSCALL_E
+#define IPIPE_TRAP __IPIPE_TRAP_E
+#define IPIPE_KEVENT __IPIPE_KEVENT_E
+
+struct ipipe_sysinfo {
+ int sys_nr_cpus; /* Number of CPUs on board */
+ int sys_hrtimer_irq; /* hrtimer device IRQ */
+ u64 sys_hrtimer_freq; /* hrtimer device frequency */
+ u64 sys_hrclock_freq; /* hrclock device frequency */
+ u64 sys_cpu_freq; /* CPU frequency (Hz) */
+ struct ipipe_arch_sysinfo arch;
+};
+
+struct ipipe_work_header {
+ size_t size;
+ void (*handler)(struct ipipe_work_header *work);
+};
+
+extern unsigned int __ipipe_printk_virq;
+
+void __ipipe_set_irq_pending(struct ipipe_domain *ipd, unsigned int irq);
+
+void __ipipe_complete_domain_migration(void);
+
+int __ipipe_switch_tail(void);
+
+int __ipipe_migrate_head(void);
+
+void __ipipe_reenter_root(void);
+
+void __ipipe_share_current(int flags);
+
+void __ipipe_arch_share_current(int flags);
+
+int __ipipe_disable_ondemand_mappings(struct task_struct *p);
+
+int __ipipe_pin_vma(struct mm_struct *mm, struct vm_area_struct *vma);
+
+/*
+ * Obsolete - no arch implements PIC muting anymore. Null helpers are
+ * kept for building legacy co-kernel releases.
+ */
+static inline void ipipe_mute_pic(void) { }
+static inline void ipipe_unmute_pic(void) { }
+
+#ifdef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
+
+#define prepare_arch_switch(next) \
+ do { \
+ hard_local_irq_enable(); \
+ __ipipe_report_schedule(current, next); \
+ } while(0)
+
+#ifndef ipipe_get_active_mm
+static inline struct mm_struct *ipipe_get_active_mm(void)
+{
+ return __this_cpu_read(ipipe_percpu.active_mm);
+}
+#define ipipe_get_active_mm ipipe_get_active_mm
+#endif
+
+#else /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+
+#define prepare_arch_switch(next) \
+ do { \
+ __ipipe_report_schedule(current, next); \
+ hard_local_irq_disable(); \
+ } while(0)
+
+#ifndef ipipe_get_active_mm
+#define ipipe_get_active_mm() (current->active_mm)
+#endif
+
+#endif /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+
+static inline bool __ipipe_hrclock_ok(void)
+{
+ return __ipipe_hrclock_freq != 0;
+}
+
+static inline void __ipipe_nmi_enter(void)
+{
+ __this_cpu_write(ipipe_percpu.nmi_state, __ipipe_root_status);
+ __set_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+ ipipe_save_context_nmi();
+}
+
+static inline void __ipipe_nmi_exit(void)
+{
+ ipipe_restore_context_nmi();
+ if (!test_bit(IPIPE_STALL_FLAG, raw_cpu_ptr(&ipipe_percpu.nmi_state)))
+ __clear_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+}
+
+/* KVM-side calls, hw IRQs off. */
+static inline void __ipipe_enter_vm(struct ipipe_vm_notifier *vmf)
+{
+ struct ipipe_percpu_data *p;
+
+ p = raw_cpu_ptr(&ipipe_percpu);
+ p->vm_notifier = vmf;
+ barrier();
+}
+
+static inline void __ipipe_exit_vm(void)
+{
+ struct ipipe_percpu_data *p;
+
+ p = raw_cpu_ptr(&ipipe_percpu);
+ p->vm_notifier = NULL;
+ barrier();
+}
+
+/* Client-side call, hw IRQs off. */
+void __ipipe_notify_vm_preemption(void);
+
+static inline void __ipipe_sync_pipeline(struct ipipe_domain *top)
+{
+ if (__ipipe_current_domain != top) {
+ __ipipe_do_sync_pipeline(top);
+ return;
+ }
+ if (!test_bit(IPIPE_STALL_FLAG, &ipipe_this_cpu_context(top)->status))
+ __ipipe_sync_stage();
+}
+
+void ipipe_register_head(struct ipipe_domain *ipd,
+ const char *name);
+
+void ipipe_unregister_head(struct ipipe_domain *ipd);
+
+int ipipe_request_irq(struct ipipe_domain *ipd,
+ unsigned int irq,
+ ipipe_irq_handler_t handler,
+ void *cookie,
+ ipipe_irq_ackfn_t ackfn);
+
+void ipipe_free_irq(struct ipipe_domain *ipd,
+ unsigned int irq);
+
+void ipipe_raise_irq(unsigned int irq);
+
+void ipipe_set_hooks(struct ipipe_domain *ipd,
+ int enables);
+
+int ipipe_handle_syscall(struct thread_info *ti,
+ unsigned long nr, struct pt_regs *regs);
+
+unsigned int ipipe_alloc_virq(void);
+
+void ipipe_free_virq(unsigned int virq);
+
+static inline void ipipe_post_irq_head(unsigned int irq)
+{
+ __ipipe_set_irq_pending(ipipe_head_domain, irq);
+}
+
+static inline void ipipe_post_irq_root(unsigned int irq)
+{
+ __ipipe_set_irq_pending(&ipipe_root, irq);
+}
+
+static inline void ipipe_stall_head(void)
+{
+ hard_local_irq_disable();
+ __set_bit(IPIPE_STALL_FLAG, &__ipipe_head_status);
+}
+
+static inline unsigned long ipipe_test_and_stall_head(void)
+{
+ hard_local_irq_disable();
+ return __test_and_set_bit(IPIPE_STALL_FLAG, &__ipipe_head_status);
+}
+
+static inline unsigned long ipipe_test_head(void)
+{
+ unsigned long flags, ret;
+
+ flags = hard_smp_local_irq_save();
+ ret = test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status);
+ hard_smp_local_irq_restore(flags);
+
+ return ret;
+}
+
+void ipipe_unstall_head(void);
+
+void __ipipe_restore_head(unsigned long x);
+
+static inline void ipipe_restore_head(unsigned long x)
+{
+ ipipe_check_irqoff();
+ if ((x ^ test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status)) & 1)
+ __ipipe_restore_head(x);
+}
+
+void __ipipe_post_work_root(struct ipipe_work_header *work);
+
+#define ipipe_post_work_root(p, header) \
+ do { \
+ void header_not_at_start(void); \
+ if (offsetof(typeof(*(p)), header)) { \
+ header_not_at_start(); \
+ } \
+ __ipipe_post_work_root(&(p)->header); \
+ } while (0)
+
+int ipipe_get_sysinfo(struct ipipe_sysinfo *sysinfo);
+
+unsigned long ipipe_critical_enter(void (*syncfn)(void));
+
+void ipipe_critical_exit(unsigned long flags);
+
+void ipipe_prepare_panic(void);
+
+#ifdef CONFIG_SMP
+#ifndef ipipe_smp_p
+#define ipipe_smp_p (1)
+#endif
+int ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask);
+void ipipe_send_ipi(unsigned int ipi, cpumask_t cpumask);
+#else /* !CONFIG_SMP */
+#define ipipe_smp_p (0)
+static inline
+int ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask) { return 0; }
+static inline void ipipe_send_ipi(unsigned int ipi, cpumask_t cpumask) { }
+static inline void ipipe_disable_smp(void) { }
+#endif /* CONFIG_SMP */
+
+static inline void ipipe_restore_root_nosync(unsigned long x)
+{
+ unsigned long flags;
+
+ flags = hard_smp_local_irq_save();
+ __ipipe_restore_root_nosync(x);
+ hard_smp_local_irq_restore(flags);
+}
+
+/* Must be called hw IRQs off. */
+static inline void ipipe_lock_irq(unsigned int irq)
+{
+ struct ipipe_domain *ipd = __ipipe_current_domain;
+ if (ipd == ipipe_root_domain)
+ __ipipe_lock_irq(irq);
+}
+
+/* Must be called hw IRQs off. */
+static inline void ipipe_unlock_irq(unsigned int irq)
+{
+ struct ipipe_domain *ipd = __ipipe_current_domain;
+ if (ipd == ipipe_root_domain)
+ __ipipe_unlock_irq(irq);
+}
+
+static inline struct ipipe_threadinfo *ipipe_current_threadinfo(void)
+{
+ return ¤t_thread_info()->ipipe_data;
+}
+
+#define ipipe_task_threadinfo(p) (&task_thread_info(p)->ipipe_data)
+
+int ipipe_enable_irq(unsigned int irq);
+
+static inline void ipipe_disable_irq(unsigned int irq)
+{
+ struct irq_desc *desc;
+ struct irq_chip *chip;
+
+ desc = irq_to_desc(irq);
+ if (desc == NULL)
+ return;
+
+ chip = irq_desc_get_chip(desc);
+
+ if (WARN_ON_ONCE(chip->irq_disable == NULL && chip->irq_mask == NULL))
+ return;
+
+ if (chip->irq_disable)
+ chip->irq_disable(&desc->irq_data);
+ else
+ chip->irq_mask(&desc->irq_data);
+}
+
+static inline void ipipe_end_irq(unsigned int irq)
+{
+ struct irq_desc *desc = irq_to_desc(irq);
+
+ if (desc)
+ desc->ipipe_end(desc);
+}
+
+static inline int ipipe_chained_irq_p(struct irq_desc *desc)
+{
+ void __ipipe_chained_irq(struct irq_desc *desc);
+
+ return desc->handle_irq == __ipipe_chained_irq;
+}
+
+static inline void ipipe_handle_demuxed_irq(unsigned int cascade_irq)
+{
+ ipipe_trace_irq_entry(cascade_irq);
+ __ipipe_dispatch_irq(cascade_irq, IPIPE_IRQF_NOSYNC);
+ ipipe_trace_irq_exit(cascade_irq);
+}
+
+static inline void __ipipe_init_threadflags(struct thread_info *ti)
+{
+ ti->ipipe_flags = 0;
+}
+
+static inline
+void ipipe_set_ti_thread_flag(struct thread_info *ti, int flag)
+{
+ set_bit(flag, &ti->ipipe_flags);
+}
+
+static inline
+void ipipe_clear_ti_thread_flag(struct thread_info *ti, int flag)
+{
+ clear_bit(flag, &ti->ipipe_flags);
+}
+
+static inline
+void ipipe_test_and_clear_ti_thread_flag(struct thread_info *ti, int flag)
+{
+ test_and_clear_bit(flag, &ti->ipipe_flags);
+}
+
+static inline
+int ipipe_test_ti_thread_flag(struct thread_info *ti, int flag)
+{
+ return test_bit(flag, &ti->ipipe_flags);
+}
+
+#define ipipe_set_thread_flag(flag) \
+ ipipe_set_ti_thread_flag(current_thread_info(), flag)
+
+#define ipipe_clear_thread_flag(flag) \
+ ipipe_clear_ti_thread_flag(current_thread_info(), flag)
+
+#define ipipe_test_and_clear_thread_flag(flag) \
+ ipipe_test_and_clear_ti_thread_flag(current_thread_info(), flag)
+
+#define ipipe_test_thread_flag(flag) \
+ ipipe_test_ti_thread_flag(current_thread_info(), flag)
+
+#define ipipe_enable_notifier(p) \
+ ipipe_set_ti_thread_flag(task_thread_info(p), TIP_NOTIFY)
+
+#define ipipe_disable_notifier(p) \
+ do { \
+ struct thread_info *ti = task_thread_info(p); \
+ ipipe_clear_ti_thread_flag(ti, TIP_NOTIFY); \
+ ipipe_clear_ti_thread_flag(ti, TIP_MAYDAY); \
+ } while (0)
+
+#define ipipe_notifier_enabled_p(p) \
+ ipipe_test_ti_thread_flag(task_thread_info(p), TIP_NOTIFY)
+
+#define ipipe_raise_mayday(p) \
+ do { \
+ struct thread_info *ti = task_thread_info(p); \
+ ipipe_check_irqoff(); \
+ if (ipipe_test_ti_thread_flag(ti, TIP_NOTIFY)) \
+ ipipe_set_ti_thread_flag(ti, TIP_MAYDAY); \
+ } while (0)
+
+#define ipipe_enable_user_intret_notifier() \
+ ipipe_set_thread_flag(TIP_USERINTRET)
+
+#define ipipe_disable_user_intret_notifier() \
+ ipipe_clear_thread_flag(TIP_USERINTRET)
+
+#define ipipe_user_intret_notifier_enabled(ti) \
+ ipipe_test_ti_thread_flag(ti, TIP_USERINTRET)
+
+#ifdef CONFIG_IPIPE_TRACE
+void __ipipe_tracer_hrclock_initialized(void);
+#else /* !CONFIG_IPIPE_TRACE */
+#define __ipipe_tracer_hrclock_initialized() do { } while(0)
+#endif /* !CONFIG_IPIPE_TRACE */
+
+#ifdef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
+#define ipipe_mm_switch_protect(__flags) do { (void)(__flags); } while (0)
+#define ipipe_mm_switch_unprotect(__flags) do { (void)(__flags); } while (0)
+#else /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+#define ipipe_mm_switch_protect(__flags) \
+ do { \
+ (__flags) = hard_local_irq_save(); \
+ } while (0)
+#define ipipe_mm_switch_unprotect(__flags) \
+ do { \
+ hard_local_irq_restore(__flags); \
+ } while (0)
+#endif /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+
+bool ipipe_enter_cpuidle(struct cpuidle_device *dev,
+ struct cpuidle_state *state);
+
+#else /* !CONFIG_IPIPE */
+
+static inline void __ipipe_init_early(void) { }
+
+static inline void __ipipe_init(void) { }
+
+static inline void __ipipe_init_proc(void) { }
+
+static inline void __ipipe_idle(void) { }
+
+static inline void __ipipe_report_sigwake(struct task_struct *p) { }
+
+static inline void __ipipe_report_setaffinity(struct task_struct *p,
+ int dest_cpu) { }
+
+static inline void __ipipe_report_setsched(struct task_struct *p) { }
+
+static inline void __ipipe_report_exit(struct task_struct *p) { }
+
+static inline void __ipipe_report_cleanup(struct mm_struct *mm) { }
+
+static inline void __ipipe_report_ptrace_resume(struct task_struct *p,
+ long request) { }
+
+#define __ipipe_report_trap(exception, regs) 0
+
+#define hard_preempt_disable() ({ preempt_disable(); 0; })
+#define hard_preempt_enable(flags) ({ preempt_enable(); (void)(flags); })
+
+#define __ipipe_get_cpu(flags) ({ (void)(flags); get_cpu(); })
+#define __ipipe_put_cpu(flags) \
+ do { \
+ (void)(flags); \
+ put_cpu(); \
+ } while (0)
+
+#define __ipipe_root_tick_p(regs) 1
+
+#define ipipe_handle_domain_irq(__domain, __hwirq, __regs) \
+ handle_domain_irq(__domain, __hwirq, __regs)
+
+#define ipipe_handle_demuxed_irq(irq) generic_handle_irq(irq)
+
+#define __ipipe_enter_vm(vmf) do { } while (0)
+
+static inline void __ipipe_exit_vm(void) { }
+
+static inline void __ipipe_notify_vm_preemption(void) { }
+
+#define __ipipe_notify_user_intreturn() 0
+
+#define __ipipe_serial_debug(__fmt, __args...) do { } while (0)
+
+#define __ipipe_root_p 1
+#define ipipe_root_p 1
+
+#define ipipe_mm_switch_protect(__flags) do { (void)(__flags); } while (0)
+#define ipipe_mm_switch_unprotect(__flags) do { (void)(__flags); } while (0)
+
+static inline void __ipipe_init_threadflags(struct thread_info *ti) { }
+
+static inline void __ipipe_complete_domain_migration(void) { }
+
+static inline int __ipipe_switch_tail(void)
+{
+ return 0;
+}
+
+static inline void __ipipe_nmi_enter(void) { }
+
+static inline void __ipipe_nmi_exit(void) { }
+
+#define ipipe_processor_id() smp_processor_id()
+
+static inline void ipipe_lock_irq(unsigned int irq) { }
+
+static inline void ipipe_unlock_irq(unsigned int irq) { }
+
+static inline
+int ipipe_handle_syscall(struct thread_info *ti,
+ unsigned long nr, struct pt_regs *regs)
+{
+ return 0;
+}
+
+static inline
+bool ipipe_enter_cpuidle(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ return true;
+}
+
+#define ipipe_user_intret_notifier_enabled(ti) 0
+
+#endif /* !CONFIG_IPIPE */
+
+#ifdef CONFIG_IPIPE_WANT_PTE_PINNING
+void __ipipe_pin_mapping_globally(unsigned long start,
+ unsigned long end);
+#else
+static inline void __ipipe_pin_mapping_globally(unsigned long start,
+ unsigned long end)
+{ }
+#endif
+
+#ifndef ipipe_root_nr_syscalls
+#define ipipe_root_nr_syscalls(ti) NR_syscalls
+#endif
+
+#endif /* !__LINUX_IPIPE_H */
diff --git a/include/linux/ipipe_debug.h b/include/linux/ipipe_debug.h
new file mode 100644
index 000000000000..5d7efefbdddf
--- /dev/null
+++ b/include/linux/ipipe_debug.h
@@ -0,0 +1,100 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe_debug.h
+ *
+ * Copyright (C) 2012 Philippe Gerum <rpm@xenomai.org>.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_DEBUG_H
+#define __LINUX_IPIPE_DEBUG_H
+
+#include <linux/ipipe_domain.h>
+
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+
+#include <asm/bug.h>
+
+static inline int ipipe_disable_context_check(void)
+{
+ return xchg(raw_cpu_ptr(&ipipe_percpu.context_check), 0);
+}
+
+static inline void ipipe_restore_context_check(int old_state)
+{
+ __this_cpu_write(ipipe_percpu.context_check, old_state);
+}
+
+static inline void ipipe_context_check_off(void)
+{
+ int cpu;
+ for_each_online_cpu(cpu)
+ per_cpu(ipipe_percpu, cpu).context_check = 0;
+}
+
+static inline void ipipe_save_context_nmi(void)
+{
+ int state = ipipe_disable_context_check();
+ __this_cpu_write(ipipe_percpu.context_check_saved, state);
+}
+
+static inline void ipipe_restore_context_nmi(void)
+{
+ ipipe_restore_context_check(__this_cpu_read(ipipe_percpu.context_check_saved));
+}
+
+#else /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+
+static inline int ipipe_disable_context_check(void)
+{
+ return 0;
+}
+
+static inline void ipipe_restore_context_check(int old_state) { }
+
+static inline void ipipe_context_check_off(void) { }
+
+static inline void ipipe_save_context_nmi(void) { }
+
+static inline void ipipe_restore_context_nmi(void) { }
+
+#endif /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+
+#ifdef CONFIG_IPIPE_DEBUG
+
+#define ipipe_check_irqoff() \
+ do { \
+ if (WARN_ON_ONCE(!hard_irqs_disabled())) \
+ hard_local_irq_disable(); \
+ } while (0)
+
+#else /* !CONFIG_IPIPE_DEBUG */
+
+static inline void ipipe_check_irqoff(void) { }
+
+#endif /* !CONFIG_IPIPE_DEBUG */
+
+#ifdef CONFIG_IPIPE_DEBUG_INTERNAL
+#define IPIPE_WARN(c) WARN_ON(c)
+#define IPIPE_WARN_ONCE(c) WARN_ON_ONCE(c)
+#define IPIPE_BUG_ON(c) BUG_ON(c)
+#else
+#define IPIPE_WARN(c) do { (void)(c); } while (0)
+#define IPIPE_WARN_ONCE(c) do { (void)(c); } while (0)
+#define IPIPE_BUG_ON(c) do { (void)(c); } while (0)
+#endif
+
+#endif /* !__LINUX_IPIPE_DEBUG_H */
diff --git a/include/linux/ipipe_domain.h b/include/linux/ipipe_domain.h
new file mode 100644
index 000000000000..6c7504d15055
--- /dev/null
+++ b/include/linux/ipipe_domain.h
@@ -0,0 +1,368 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe_domain.h
+ *
+ * Copyright (C) 2007-2012 Philippe Gerum.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_DOMAIN_H
+#define __LINUX_IPIPE_DOMAIN_H
+
+#ifdef CONFIG_IPIPE
+
+#include <linux/mutex.h>
+#include <linux/percpu.h>
+#include <asm/ptrace.h>
+#include <asm/hw_irq.h>
+#include <asm/ipipe_base.h>
+
+struct task_struct;
+struct mm_struct;
+struct irq_desc;
+struct ipipe_vm_notifier;
+
+#define __bpl_up(x) (((x)+(BITS_PER_LONG-1)) & ~(BITS_PER_LONG-1))
+/* Number of virtual IRQs (must be a multiple of BITS_PER_LONG) */
+#define IPIPE_NR_VIRQS BITS_PER_LONG
+/* First virtual IRQ # (must be aligned on BITS_PER_LONG) */
+#define IPIPE_VIRQ_BASE __bpl_up(IPIPE_NR_XIRQS)
+/* Total number of IRQ slots */
+#define IPIPE_NR_IRQS (IPIPE_VIRQ_BASE+IPIPE_NR_VIRQS)
+
+#define IPIPE_IRQ_MAPSZ (IPIPE_NR_IRQS / BITS_PER_LONG)
+#define IPIPE_IRQ_1MAPSZ BITS_PER_LONG
+#if IPIPE_IRQ_MAPSZ > BITS_PER_LONG * BITS_PER_LONG
+/*
+ * We need a 4-level mapping, up to 16M IRQs (64bit long, MAXSMP
+ * defines 512K IRQs).
+ */
+#define __IPIPE_IRQMAP_LEVELS 4
+#define IPIPE_IRQ_2MAPSZ (BITS_PER_LONG * BITS_PER_LONG)
+#elif IPIPE_IRQ_MAPSZ > BITS_PER_LONG
+/*
+ * 3-level mapping. Up to 256K IRQs (64 bit long).
+ */
+#define __IPIPE_IRQMAP_LEVELS 3
+#else
+/*
+ * 2-level mapping is enough. Up to 4K IRQs (64 bit long).
+ */
+#define __IPIPE_IRQMAP_LEVELS 2
+#endif
+
+/* Per-cpu pipeline status */
+#define IPIPE_STALL_FLAG 0 /* interrupts (virtually) disabled. */
+#define IPIPE_STALL_MASK (1L << IPIPE_STALL_FLAG)
+
+/* Interrupt control bits */
+#define IPIPE_HANDLE_FLAG 0
+#define IPIPE_STICKY_FLAG 1
+#define IPIPE_LOCK_FLAG 2
+#define IPIPE_HANDLE_MASK (1 << IPIPE_HANDLE_FLAG)
+#define IPIPE_STICKY_MASK (1 << IPIPE_STICKY_FLAG)
+#define IPIPE_LOCK_MASK (1 << IPIPE_LOCK_FLAG)
+
+#define __IPIPE_SYSCALL_P 0
+#define __IPIPE_TRAP_P 1
+#define __IPIPE_KEVENT_P 2
+#define __IPIPE_SYSCALL_E (1 << __IPIPE_SYSCALL_P)
+#define __IPIPE_TRAP_E (1 << __IPIPE_TRAP_P)
+#define __IPIPE_KEVENT_E (1 << __IPIPE_KEVENT_P)
+#define __IPIPE_ALL_E 0x7
+#define __IPIPE_SYSCALL_R (8 << __IPIPE_SYSCALL_P)
+#define __IPIPE_TRAP_R (8 << __IPIPE_TRAP_P)
+#define __IPIPE_KEVENT_R (8 << __IPIPE_KEVENT_P)
+#define __IPIPE_SHIFT_R 3
+#define __IPIPE_ALL_R (__IPIPE_ALL_E << __IPIPE_SHIFT_R)
+
+#define IPIPE_KEVT_SCHEDULE 0
+#define IPIPE_KEVT_SIGWAKE 1
+#define IPIPE_KEVT_SETSCHED 2
+#define IPIPE_KEVT_SETAFFINITY 3
+#define IPIPE_KEVT_EXIT 4
+#define IPIPE_KEVT_CLEANUP 5
+#define IPIPE_KEVT_HOSTRT 6
+#define IPIPE_KEVT_CLOCKFREQ 7
+#define IPIPE_KEVT_USERINTRET 8
+#define IPIPE_KEVT_PTRESUME 9
+
+typedef void (*ipipe_irq_ackfn_t)(struct irq_desc *desc);
+
+typedef void (*ipipe_irq_handler_t)(unsigned int irq,
+ void *cookie);
+
+struct ipipe_domain {
+ int context_offset;
+ struct ipipe_irqdesc {
+ unsigned long control;
+ ipipe_irq_ackfn_t ackfn;
+ ipipe_irq_handler_t handler;
+ void *cookie;
+ } ____cacheline_aligned irqs[IPIPE_NR_IRQS];
+ const char *name;
+ struct mutex mutex;
+};
+
+static inline void *
+__ipipe_irq_cookie(struct ipipe_domain *ipd, unsigned int irq)
+{
+ return ipd->irqs[irq].cookie;
+}
+
+static inline ipipe_irq_handler_t
+__ipipe_irq_handler(struct ipipe_domain *ipd, unsigned int irq)
+{
+ return ipd->irqs[irq].handler;
+}
+
+extern struct ipipe_domain ipipe_root;
+
+#define ipipe_root_domain (&ipipe_root)
+
+extern struct ipipe_domain *ipipe_head_domain;
+
+struct ipipe_percpu_domain_data {
+ unsigned long status; /* <= Must be first in struct. */
+ unsigned long irqpend_0map;
+#if __IPIPE_IRQMAP_LEVELS >= 3
+ unsigned long irqpend_1map[IPIPE_IRQ_1MAPSZ];
+#if __IPIPE_IRQMAP_LEVELS >= 4
+ unsigned long irqpend_2map[IPIPE_IRQ_2MAPSZ];
+#endif
+#endif
+ unsigned long irqpend_map[IPIPE_IRQ_MAPSZ];
+ unsigned long irqheld_map[IPIPE_IRQ_MAPSZ];
+ unsigned long irqall[IPIPE_NR_IRQS];
+ struct ipipe_domain *domain;
+ int coflags;
+};
+
+struct ipipe_percpu_data {
+ struct ipipe_percpu_domain_data root;
+ struct ipipe_percpu_domain_data head;
+ struct ipipe_percpu_domain_data *curr;
+ struct pt_regs tick_regs;
+ int hrtimer_irq;
+ struct task_struct *task_hijacked;
+ struct task_struct *rqlock_owner;
+ struct ipipe_vm_notifier *vm_notifier;
+ unsigned long nmi_state;
+ struct mm_struct *active_mm;
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+ int context_check;
+ int context_check_saved;
+#endif
+};
+
+/*
+ * CAREFUL: all accessors based on __ipipe_raw_cpu_ptr() you may find
+ * in this file should be used only while hw interrupts are off, to
+ * prevent from CPU migration regardless of the running domain.
+ */
+DECLARE_PER_CPU(struct ipipe_percpu_data, ipipe_percpu);
+
+static inline struct ipipe_percpu_domain_data *
+__context_of(struct ipipe_percpu_data *p, struct ipipe_domain *ipd)
+{
+ return (void *)p + ipd->context_offset;
+}
+
+/**
+ * ipipe_percpu_context - return the address of the pipeline context
+ * data for a domain on a given CPU.
+ *
+ * NOTE: this is the slowest accessor, use it carefully. Prefer
+ * ipipe_this_cpu_context() for requests targeted at the current
+ * CPU. Additionally, if the target domain is known at build time,
+ * consider ipipe_this_cpu_{root, head}_context().
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_percpu_context(struct ipipe_domain *ipd, int cpu)
+{
+ return __context_of(&per_cpu(ipipe_percpu, cpu), ipd);
+}
+
+/**
+ * ipipe_this_cpu_context - return the address of the pipeline context
+ * data for a domain on the current CPU. hw IRQs must be off.
+ *
+ * NOTE: this accessor is a bit faster, but since we don't know which
+ * one of "root" or "head" ipd refers to, we still need to compute the
+ * context address from its offset.
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_this_cpu_context(struct ipipe_domain *ipd)
+{
+ return __context_of(__ipipe_raw_cpu_ptr(&ipipe_percpu), ipd);
+}
+
+/**
+ * ipipe_this_cpu_root_context - return the address of the pipeline
+ * context data for the root domain on the current CPU. hw IRQs must
+ * be off.
+ *
+ * NOTE: this accessor is recommended when the domain we refer to is
+ * known at build time to be the root one.
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_this_cpu_root_context(void)
+{
+ return __ipipe_raw_cpu_ptr(&ipipe_percpu.root);
+}
+
+/**
+ * ipipe_this_cpu_head_context - return the address of the pipeline
+ * context data for the registered head domain on the current CPU. hw
+ * IRQs must be off.
+ *
+ * NOTE: this accessor is recommended when the domain we refer to is
+ * known at build time to be the registered head domain. This address
+ * is always different from the context data of the root domain in
+ * absence of registered head domain. To get the address of the
+ * context data for the domain leading the pipeline at the time of the
+ * call (which may be root in absence of registered head domain), use
+ * ipipe_this_cpu_leading_context() instead.
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_this_cpu_head_context(void)
+{
+ return __ipipe_raw_cpu_ptr(&ipipe_percpu.head);
+}
+
+/**
+ * ipipe_this_cpu_leading_context - return the address of the pipeline
+ * context data for the domain leading the pipeline on the current
+ * CPU. hw IRQs must be off.
+ *
+ * NOTE: this accessor is required when either root or a registered
+ * head domain may be the final target of this call, depending on
+ * whether the high priority domain was installed via
+ * ipipe_register_head().
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_this_cpu_leading_context(void)
+{
+ return ipipe_this_cpu_context(ipipe_head_domain);
+}
+
+/**
+ * __ipipe_get_current_context() - return the address of the pipeline
+ * context data of the domain running on the current CPU. hw IRQs must
+ * be off.
+ */
+static inline struct ipipe_percpu_domain_data *__ipipe_get_current_context(void)
+{
+ return __ipipe_raw_cpu_read(ipipe_percpu.curr);
+}
+
+#define __ipipe_current_context __ipipe_get_current_context()
+
+/**
+ * __ipipe_set_current_context() - switch the current CPU to the
+ * specified domain context. hw IRQs must be off.
+ *
+ * NOTE: this is the only way to change the current domain for the
+ * current CPU. Don't bypass.
+ */
+static inline
+void __ipipe_set_current_context(struct ipipe_percpu_domain_data *pd)
+{
+ struct ipipe_percpu_data *p;
+ p = __ipipe_raw_cpu_ptr(&ipipe_percpu);
+ p->curr = pd;
+}
+
+/**
+ * __ipipe_set_current_domain() - switch the current CPU to the
+ * specified domain. This is equivalent to calling
+ * __ipipe_set_current_context() with the context data of that
+ * domain. hw IRQs must be off.
+ */
+static inline void __ipipe_set_current_domain(struct ipipe_domain *ipd)
+{
+ struct ipipe_percpu_data *p;
+ p = __ipipe_raw_cpu_ptr(&ipipe_percpu);
+ p->curr = __context_of(p, ipd);
+}
+
+static inline struct ipipe_percpu_domain_data *ipipe_current_context(void)
+{
+ struct ipipe_percpu_domain_data *pd;
+ unsigned long flags;
+
+ flags = hard_smp_local_irq_save();
+ pd = __ipipe_get_current_context();
+ hard_smp_local_irq_restore(flags);
+
+ return pd;
+}
+
+static inline struct ipipe_domain *__ipipe_get_current_domain(void)
+{
+ return __ipipe_get_current_context()->domain;
+}
+
+#define __ipipe_current_domain __ipipe_get_current_domain()
+
+/**
+ * __ipipe_get_current_domain() - return the address of the pipeline
+ * domain running on the current CPU. hw IRQs must be off.
+ */
+static inline struct ipipe_domain *ipipe_get_current_domain(void)
+{
+ struct ipipe_domain *ipd;
+ unsigned long flags;
+
+ flags = hard_smp_local_irq_save();
+ ipd = __ipipe_get_current_domain();
+ hard_smp_local_irq_restore(flags);
+
+ return ipd;
+}
+
+#define ipipe_current_domain ipipe_get_current_domain()
+
+#define __ipipe_root_p (__ipipe_current_domain == ipipe_root_domain)
+#define ipipe_root_p (ipipe_current_domain == ipipe_root_domain)
+
+#ifdef CONFIG_SMP
+#define __ipipe_root_status (ipipe_this_cpu_root_context()->status)
+#else
+extern unsigned long __ipipe_root_status;
+#endif
+
+#define __ipipe_head_status (ipipe_this_cpu_head_context()->status)
+
+/**
+ * __ipipe_ipending_p() - Whether we have interrupts pending
+ * (i.e. logged) for the given domain context on the current CPU. hw
+ * IRQs must be off.
+ */
+static inline int __ipipe_ipending_p(struct ipipe_percpu_domain_data *pd)
+{
+ return pd->irqpend_0map != 0;
+}
+
+static inline unsigned long
+__ipipe_cpudata_irq_hits(struct ipipe_domain *ipd, int cpu, unsigned int irq)
+{
+ return ipipe_percpu_context(ipd, cpu)->irqall[irq];
+}
+
+#endif /* CONFIG_IPIPE */
+
+#endif /* !__LINUX_IPIPE_DOMAIN_H */
diff --git a/include/linux/ipipe_lock.h b/include/linux/ipipe_lock.h
new file mode 100644
index 000000000000..da6188d45501
--- /dev/null
+++ b/include/linux/ipipe_lock.h
@@ -0,0 +1,329 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe_lock.h
+ *
+ * Copyright (C) 2009 Philippe Gerum.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_LOCK_H
+#define __LINUX_IPIPE_LOCK_H
+
+#include <asm-generic/ipipe.h>
+
+typedef struct {
+ arch_spinlock_t arch_lock;
+} __ipipe_spinlock_t;
+
+#define ipipe_spinlock(lock) ((__ipipe_spinlock_t *)(lock))
+#define ipipe_spinlock_p(lock) \
+ __builtin_types_compatible_p(typeof(lock), __ipipe_spinlock_t *) || \
+ __builtin_types_compatible_p(typeof(lock), __ipipe_spinlock_t [])
+
+#define std_spinlock_raw(lock) ((raw_spinlock_t *)(lock))
+#define std_spinlock_raw_p(lock) \
+ __builtin_types_compatible_p(typeof(lock), raw_spinlock_t *) || \
+ __builtin_types_compatible_p(typeof(lock), raw_spinlock_t [])
+
+#ifdef CONFIG_PREEMPT_RT_FULL
+
+#define PICK_SPINLOCK_IRQSAVE(lock, flags) \
+ do { \
+ if (ipipe_spinlock_p(lock)) \
+ (flags) = __ipipe_spin_lock_irqsave(ipipe_spinlock(lock)); \
+ else if (std_spinlock_raw_p(lock)) \
+ __real_raw_spin_lock_irqsave(std_spinlock_raw(lock), flags); \
+ else __bad_lock_type(); \
+ } while (0)
+
+#define PICK_SPINTRYLOCK_IRQSAVE(lock, flags) \
+ ({ \
+ int __ret__; \
+ if (ipipe_spinlock_p(lock)) \
+ __ret__ = __ipipe_spin_trylock_irqsave(ipipe_spinlock(lock), &(flags)); \
+ else if (std_spinlock_raw_p(lock)) \
+ __ret__ = __real_raw_spin_trylock_irqsave(std_spinlock_raw(lock), flags); \
+ else __bad_lock_type(); \
+ __ret__; \
+ })
+
+#define PICK_SPINTRYLOCK_IRQ(lock) \
+ ({ \
+ int __ret__; \
+ if (ipipe_spinlock_p(lock)) \
+ __ret__ = __ipipe_spin_trylock_irq(ipipe_spinlock(lock)); \
+ else if (std_spinlock_raw_p(lock)) \
+ __ret__ = __real_raw_spin_trylock_irq(std_spinlock_raw(lock)); \
+ else __bad_lock_type(); \
+ __ret__; \
+ })
+
+#define PICK_SPINUNLOCK_IRQRESTORE(lock, flags) \
+ do { \
+ if (ipipe_spinlock_p(lock)) \
+ __ipipe_spin_unlock_irqrestore(ipipe_spinlock(lock), flags); \
+ else if (std_spinlock_raw_p(lock)) { \
+ __ipipe_spin_unlock_debug(flags); \
+ __real_raw_spin_unlock_irqrestore(std_spinlock_raw(lock), flags); \
+ } else __bad_lock_type(); \
+ } while (0)
+
+#define PICK_SPINOP(op, lock) \
+ ({ \
+ if (ipipe_spinlock_p(lock)) \
+ arch_spin##op(&ipipe_spinlock(lock)->arch_lock); \
+ else if (std_spinlock_raw_p(lock)) \
+ __real_raw_spin##op(std_spinlock_raw(lock)); \
+ else __bad_lock_type(); \
+ (void)0; \
+ })
+
+#define PICK_SPINOP_RET(op, lock, type) \
+ ({ \
+ type __ret__; \
+ if (ipipe_spinlock_p(lock)) \
+ __ret__ = arch_spin##op(&ipipe_spinlock(lock)->arch_lock); \
+ else if (std_spinlock_raw_p(lock)) \
+ __ret__ = __real_raw_spin##op(std_spinlock_raw(lock)); \
+ else { __ret__ = -1; __bad_lock_type(); } \
+ __ret__; \
+ })
+
+#else /* !CONFIG_PREEMPT_RT_FULL */
+
+#define std_spinlock(lock) ((spinlock_t *)(lock))
+#define std_spinlock_p(lock) \
+ __builtin_types_compatible_p(typeof(lock), spinlock_t *) || \
+ __builtin_types_compatible_p(typeof(lock), spinlock_t [])
+
+#define PICK_SPINLOCK_IRQSAVE(lock, flags) \
+ do { \
+ if (ipipe_spinlock_p(lock)) \
+ (flags) = __ipipe_spin_lock_irqsave(ipipe_spinlock(lock)); \
+ else if (std_spinlock_raw_p(lock)) \
+ __real_raw_spin_lock_irqsave(std_spinlock_raw(lock), flags); \
+ else if (std_spinlock_p(lock)) \
+ __real_raw_spin_lock_irqsave(&std_spinlock(lock)->rlock, flags); \
+ else __bad_lock_type(); \
+ } while (0)
+
+#define PICK_SPINTRYLOCK_IRQSAVE(lock, flags) \
+ ({ \
+ int __ret__; \
+ if (ipipe_spinlock_p(lock)) \
+ __ret__ = __ipipe_spin_trylock_irqsave(ipipe_spinlock(lock), &(flags)); \
+ else if (std_spinlock_raw_p(lock)) \
+ __ret__ = __real_raw_spin_trylock_irqsave(std_spinlock_raw(lock), flags); \
+ else if (std_spinlock_p(lock)) \
+ __ret__ = __real_raw_spin_trylock_irqsave(&std_spinlock(lock)->rlock, flags); \
+ else __bad_lock_type(); \
+ __ret__; \
+ })
+
+#define PICK_SPINTRYLOCK_IRQ(lock) \
+ ({ \
+ int __ret__; \
+ if (ipipe_spinlock_p(lock)) \
+ __ret__ = __ipipe_spin_trylock_irq(ipipe_spinlock(lock)); \
+ else if (std_spinlock_raw_p(lock)) \
+ __ret__ = __real_raw_spin_trylock_irq(std_spinlock_raw(lock)); \
+ else if (std_spinlock_p(lock)) \
+ __ret__ = __real_raw_spin_trylock_irq(&std_spinlock(lock)->rlock); \
+ else __bad_lock_type(); \
+ __ret__; \
+ })
+
+#define PICK_SPINUNLOCK_IRQRESTORE(lock, flags) \
+ do { \
+ if (ipipe_spinlock_p(lock)) \
+ __ipipe_spin_unlock_irqrestore(ipipe_spinlock(lock), flags); \
+ else { \
+ __ipipe_spin_unlock_debug(flags); \
+ if (std_spinlock_raw_p(lock)) \
+ __real_raw_spin_unlock_irqrestore(std_spinlock_raw(lock), flags); \
+ else if (std_spinlock_p(lock)) \
+ __real_raw_spin_unlock_irqrestore(&std_spinlock(lock)->rlock, flags); \
+ } \
+ } while (0)
+
+#define PICK_SPINOP(op, lock) \
+ ({ \
+ if (ipipe_spinlock_p(lock)) \
+ arch_spin##op(&ipipe_spinlock(lock)->arch_lock); \
+ else if (std_spinlock_raw_p(lock)) \
+ __real_raw_spin##op(std_spinlock_raw(lock)); \
+ else if (std_spinlock_p(lock)) \
+ __real_raw_spin##op(&std_spinlock(lock)->rlock); \
+ else __bad_lock_type(); \
+ (void)0; \
+ })
+
+#define PICK_SPINOP_RET(op, lock, type) \
+ ({ \
+ type __ret__; \
+ if (ipipe_spinlock_p(lock)) \
+ __ret__ = arch_spin##op(&ipipe_spinlock(lock)->arch_lock); \
+ else if (std_spinlock_raw_p(lock)) \
+ __ret__ = __real_raw_spin##op(std_spinlock_raw(lock)); \
+ else if (std_spinlock_p(lock)) \
+ __ret__ = __real_raw_spin##op(&std_spinlock(lock)->rlock); \
+ else { __ret__ = -1; __bad_lock_type(); } \
+ __ret__; \
+ })
+
+#endif /* !CONFIG_PREEMPT_RT_FULL */
+
+#define arch_spin_lock_init(lock) \
+ do { \
+ IPIPE_DEFINE_SPINLOCK(__lock__); \
+ *((ipipe_spinlock_t *)lock) = __lock__; \
+ } while (0)
+
+#define arch_spin_lock_irq(lock) \
+ do { \
+ hard_local_irq_disable(); \
+ arch_spin_lock(lock); \
+ } while (0)
+
+#define arch_spin_unlock_irq(lock) \
+ do { \
+ arch_spin_unlock(lock); \
+ hard_local_irq_enable(); \
+ } while (0)
+
+typedef struct {
+ arch_rwlock_t arch_lock;
+} __ipipe_rwlock_t;
+
+#define ipipe_rwlock_p(lock) \
+ __builtin_types_compatible_p(typeof(lock), __ipipe_rwlock_t *)
+
+#define std_rwlock_p(lock) \
+ __builtin_types_compatible_p(typeof(lock), rwlock_t *)
+
+#define ipipe_rwlock(lock) ((__ipipe_rwlock_t *)(lock))
+#define std_rwlock(lock) ((rwlock_t *)(lock))
+
+#define PICK_RWOP(op, lock) \
+ do { \
+ if (ipipe_rwlock_p(lock)) \
+ arch##op(&ipipe_rwlock(lock)->arch_lock); \
+ else if (std_rwlock_p(lock)) \
+ _raw##op(std_rwlock(lock)); \
+ else __bad_lock_type(); \
+ } while (0)
+
+extern int __bad_lock_type(void);
+
+#ifdef CONFIG_IPIPE
+
+#define ipipe_spinlock_t __ipipe_spinlock_t
+#define IPIPE_DEFINE_RAW_SPINLOCK(x) ipipe_spinlock_t x = IPIPE_SPIN_LOCK_UNLOCKED
+#define IPIPE_DECLARE_RAW_SPINLOCK(x) extern ipipe_spinlock_t x
+#define IPIPE_DEFINE_SPINLOCK(x) IPIPE_DEFINE_RAW_SPINLOCK(x)
+#define IPIPE_DECLARE_SPINLOCK(x) IPIPE_DECLARE_RAW_SPINLOCK(x)
+
+#define IPIPE_SPIN_LOCK_UNLOCKED \
+ (__ipipe_spinlock_t) { .arch_lock = __ARCH_SPIN_LOCK_UNLOCKED }
+
+#define spin_lock_irqsave_cond(lock, flags) \
+ spin_lock_irqsave(lock, flags)
+
+#define spin_unlock_irqrestore_cond(lock, flags) \
+ spin_unlock_irqrestore(lock, flags)
+
+#define raw_spin_lock_irqsave_cond(lock, flags) \
+ raw_spin_lock_irqsave(lock, flags)
+
+#define raw_spin_unlock_irqrestore_cond(lock, flags) \
+ raw_spin_unlock_irqrestore(lock, flags)
+
+void __ipipe_spin_lock_irq(ipipe_spinlock_t *lock);
+
+int __ipipe_spin_trylock_irq(ipipe_spinlock_t *lock);
+
+void __ipipe_spin_unlock_irq(ipipe_spinlock_t *lock);
+
+unsigned long __ipipe_spin_lock_irqsave(ipipe_spinlock_t *lock);
+
+int __ipipe_spin_trylock_irqsave(ipipe_spinlock_t *lock,
+ unsigned long *x);
+
+void __ipipe_spin_unlock_irqrestore(ipipe_spinlock_t *lock,
+ unsigned long x);
+
+void __ipipe_spin_unlock_irqbegin(ipipe_spinlock_t *lock);
+
+void __ipipe_spin_unlock_irqcomplete(unsigned long x);
+
+#if defined(CONFIG_IPIPE_DEBUG_INTERNAL) && defined(CONFIG_SMP)
+void __ipipe_spin_unlock_debug(unsigned long flags);
+#else
+#define __ipipe_spin_unlock_debug(flags) do { } while (0)
+#endif
+
+#define ipipe_rwlock_t __ipipe_rwlock_t
+#define IPIPE_DEFINE_RWLOCK(x) ipipe_rwlock_t x = IPIPE_RW_LOCK_UNLOCKED
+#define IPIPE_DECLARE_RWLOCK(x) extern ipipe_rwlock_t x
+
+#define IPIPE_RW_LOCK_UNLOCKED \
+ (__ipipe_rwlock_t) { .arch_lock = __ARCH_RW_LOCK_UNLOCKED }
+
+#else /* !CONFIG_IPIPE */
+
+#define ipipe_spinlock_t spinlock_t
+#define IPIPE_DEFINE_SPINLOCK(x) DEFINE_SPINLOCK(x)
+#define IPIPE_DECLARE_SPINLOCK(x) extern spinlock_t x
+#define IPIPE_SPIN_LOCK_UNLOCKED __SPIN_LOCK_UNLOCKED(unknown)
+#define IPIPE_DEFINE_RAW_SPINLOCK(x) DEFINE_RAW_SPINLOCK(x)
+#define IPIPE_DECLARE_RAW_SPINLOCK(x) extern raw_spinlock_t x
+
+#define spin_lock_irqsave_cond(lock, flags) \
+ do { \
+ (void)(flags); \
+ spin_lock(lock); \
+ } while(0)
+
+#define spin_unlock_irqrestore_cond(lock, flags) \
+ spin_unlock(lock)
+
+#define raw_spin_lock_irqsave_cond(lock, flags) \
+ do { \
+ (void)(flags); \
+ raw_spin_lock(lock); \
+ } while(0)
+
+#define raw_spin_unlock_irqrestore_cond(lock, flags) \
+ raw_spin_unlock(lock)
+
+#define __ipipe_spin_lock_irq(lock) do { } while (0)
+#define __ipipe_spin_unlock_irq(lock) do { } while (0)
+#define __ipipe_spin_lock_irqsave(lock) 0
+#define __ipipe_spin_trylock_irq(lock) 1
+#define __ipipe_spin_trylock_irqsave(lock, x) ({ (void)(x); 1; })
+#define __ipipe_spin_unlock_irqrestore(lock, x) do { (void)(x); } while (0)
+#define __ipipe_spin_unlock_irqbegin(lock) spin_unlock(lock)
+#define __ipipe_spin_unlock_irqcomplete(x) do { (void)(x); } while (0)
+#define __ipipe_spin_unlock_debug(flags) do { } while (0)
+
+#define ipipe_rwlock_t rwlock_t
+#define IPIPE_DEFINE_RWLOCK(x) DEFINE_RWLOCK(x)
+#define IPIPE_DECLARE_RWLOCK(x) extern rwlock_t x
+#define IPIPE_RW_LOCK_UNLOCKED RW_LOCK_UNLOCKED
+
+#endif /* !CONFIG_IPIPE */
+
+#endif /* !__LINUX_IPIPE_LOCK_H */
diff --git a/include/linux/ipipe_tickdev.h b/include/linux/ipipe_tickdev.h
new file mode 100644
index 000000000000..54d1e2daad6e
--- /dev/null
+++ b/include/linux/ipipe_tickdev.h
@@ -0,0 +1,167 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe_tickdev.h
+ *
+ * Copyright (C) 2007 Philippe Gerum.
+ * Copyright (C) 2012 Gilles Chanteperdrix
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_TICKDEV_H
+#define __LINUX_IPIPE_TICKDEV_H
+
+#include <linux/list.h>
+#include <linux/cpumask.h>
+#include <linux/clockchips.h>
+#include <linux/ipipe_domain.h>
+#include <linux/clocksource.h>
+#include <linux/timekeeper_internal.h>
+
+#ifdef CONFIG_IPIPE
+
+struct clock_event_device;
+
+struct ipipe_hostrt_data {
+ short live;
+ seqcount_t seqcount;
+ time_t wall_time_sec;
+ u32 wall_time_nsec;
+ struct timespec wall_to_monotonic;
+ u64 cycle_last;
+ u64 mask;
+ u32 mult;
+ u32 shift;
+};
+
+enum clock_event_mode {
+ CLOCK_EVT_MODE_PERIODIC,
+ CLOCK_EVT_MODE_ONESHOT,
+ CLOCK_EVT_MODE_UNUSED,
+ CLOCK_EVT_MODE_SHUTDOWN,
+};
+
+struct ipipe_timer {
+ int irq;
+ void (*request)(struct ipipe_timer *timer, int steal);
+ int (*set)(unsigned long ticks, void *timer);
+ void (*ack)(void);
+ void (*release)(struct ipipe_timer *timer);
+
+ /* Only if registering a timer directly */
+ const char *name;
+ unsigned rating;
+ unsigned long freq;
+ unsigned long min_delay_ticks;
+ unsigned long max_delay_ticks;
+ const struct cpumask *cpumask;
+
+ /* For internal use */
+ void *timer_set; /* pointer passed to ->set() callback */
+ struct clock_event_device *host_timer;
+ struct list_head link;
+
+ /* Conversions between clock frequency and timer frequency */
+ unsigned c2t_integ;
+ unsigned c2t_frac;
+
+ /* For clockevent interception */
+ u32 real_mult;
+ u32 real_shift;
+ void (*mode_handler)(enum clock_event_mode mode,
+ struct clock_event_device *);
+ int orig_mode;
+ int (*orig_set_state_periodic)(struct clock_event_device *);
+ int (*orig_set_state_oneshot)(struct clock_event_device *);
+ int (*orig_set_state_oneshot_stopped)(struct clock_event_device *);
+ int (*orig_set_state_shutdown)(struct clock_event_device *);
+ int (*orig_set_next_event)(unsigned long evt,
+ struct clock_event_device *cdev);
+ unsigned int (*refresh_freq)(void);
+};
+
+#define __ipipe_hrtimer_irq __ipipe_raw_cpu_read(ipipe_percpu.hrtimer_irq)
+
+extern unsigned long __ipipe_hrtimer_freq;
+
+/*
+ * Called by clockevents_register_device, to register a piggybacked
+ * ipipe timer, if there is one
+ */
+void ipipe_host_timer_register(struct clock_event_device *clkevt);
+
+/*
+ * Called by tick_cleanup_dead_cpu, to drop per-CPU timer devices
+ */
+void ipipe_host_timer_cleanup(struct clock_event_device *clkevt);
+
+/*
+ * Register a standalone ipipe timer
+ */
+void ipipe_timer_register(struct ipipe_timer *timer);
+
+/*
+ * Chooses the best timer for each cpu. Take over its handling.
+ */
+int ipipe_select_timers(const struct cpumask *mask);
+
+/*
+ * Release the per-cpu timers
+ */
+void ipipe_timers_release(void);
+
+/*
+ * Start handling the per-cpu timer irq, and intercepting the linux clockevent
+ * device callbacks.
+ */
+int ipipe_timer_start(void (*tick_handler)(void),
+ void (*emumode)(enum clock_event_mode mode,
+ struct clock_event_device *cdev),
+ int (*emutick)(unsigned long evt,
+ struct clock_event_device *cdev),
+ unsigned cpu);
+
+/*
+ * Stop handling a per-cpu timer
+ */
+void ipipe_timer_stop(unsigned cpu);
+
+/*
+ * Program the timer
+ */
+void ipipe_timer_set(unsigned long delay);
+
+const char *ipipe_timer_name(void);
+
+unsigned ipipe_timer_ns2ticks(struct ipipe_timer *timer, unsigned ns);
+
+void __ipipe_timer_refresh_freq(unsigned int hrclock_freq);
+
+#else /* !CONFIG_IPIPE */
+
+#define ipipe_host_timer_register(clkevt) do { } while (0)
+
+#define ipipe_host_timer_cleanup(clkevt) do { } while (0)
+
+#endif /* !CONFIG_IPIPE */
+
+#ifdef CONFIG_IPIPE_HAVE_HOSTRT
+void ipipe_update_hostrt(struct timekeeper *tk);
+#else
+static inline void
+ipipe_update_hostrt(struct timekeeper *tk) {}
+#endif
+
+#endif /* __LINUX_IPIPE_TICKDEV_H */
diff --git a/include/linux/ipipe_trace.h b/include/linux/ipipe_trace.h
new file mode 100644
index 000000000000..7d0c867a360b
--- /dev/null
+++ b/include/linux/ipipe_trace.h
@@ -0,0 +1,78 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe_trace.h
+ *
+ * Copyright (C) 2005 Luotao Fu.
+ * 2005-2007 Jan Kiszka.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef _LINUX_IPIPE_TRACE_H
+#define _LINUX_IPIPE_TRACE_H
+
+#ifdef CONFIG_IPIPE_TRACE
+
+#include <linux/types.h>
+
+struct pt_regs;
+
+void ipipe_trace_begin(unsigned long v);
+void ipipe_trace_end(unsigned long v);
+void ipipe_trace_freeze(unsigned long v);
+void ipipe_trace_special(unsigned char special_id, unsigned long v);
+void ipipe_trace_pid(pid_t pid, short prio);
+void ipipe_trace_event(unsigned char id, unsigned long delay_tsc);
+int ipipe_trace_max_reset(void);
+int ipipe_trace_frozen_reset(void);
+void ipipe_trace_irqbegin(int irq, struct pt_regs *regs);
+void ipipe_trace_irqend(int irq, struct pt_regs *regs);
+
+#else /* !CONFIG_IPIPE_TRACE */
+
+#define ipipe_trace_begin(v) do { (void)(v); } while(0)
+#define ipipe_trace_end(v) do { (void)(v); } while(0)
+#define ipipe_trace_freeze(v) do { (void)(v); } while(0)
+#define ipipe_trace_special(id, v) do { (void)(id); (void)(v); } while(0)
+#define ipipe_trace_pid(pid, prio) do { (void)(pid); (void)(prio); } while(0)
+#define ipipe_trace_event(id, delay_tsc) do { (void)(id); (void)(delay_tsc); } while(0)
+#define ipipe_trace_max_reset() ({ 0; })
+#define ipipe_trace_frozen_reset() ({ 0; })
+#define ipipe_trace_irqbegin(irq, regs) do { } while(0)
+#define ipipe_trace_irqend(irq, regs) do { } while(0)
+
+#endif /* !CONFIG_IPIPE_TRACE */
+
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+void ipipe_trace_panic_freeze(void);
+void ipipe_trace_panic_dump(void);
+#else
+static inline void ipipe_trace_panic_freeze(void) { }
+static inline void ipipe_trace_panic_dump(void) { }
+#endif
+
+#ifdef CONFIG_IPIPE_TRACE_IRQSOFF
+#define ipipe_trace_irq_entry(irq) ipipe_trace_begin(irq)
+#define ipipe_trace_irq_exit(irq) ipipe_trace_end(irq)
+#define ipipe_trace_irqsoff() ipipe_trace_begin(0x80000000UL)
+#define ipipe_trace_irqson() ipipe_trace_end(0x80000000UL)
+#else
+#define ipipe_trace_irq_entry(irq) do { (void)(irq);} while(0)
+#define ipipe_trace_irq_exit(irq) do { (void)(irq);} while(0)
+#define ipipe_trace_irqsoff() do { } while(0)
+#define ipipe_trace_irqson() do { } while(0)
+#endif
+
+#endif /* !__LINUX_IPIPE_TRACE_H */
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 5655da9eb1fb..150a72d0bdfa 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -498,6 +498,11 @@ struct irq_chip {
void (*irq_bus_lock)(struct irq_data *data);
void (*irq_bus_sync_unlock)(struct irq_data *data);
+#ifdef CONFIG_IPIPE
+ void (*irq_move)(struct irq_data *data);
+ void (*irq_hold)(struct irq_data *data);
+ void (*irq_release)(struct irq_data *data);
+#endif /* CONFIG_IPIPE */
void (*irq_cpu_online)(struct irq_data *data);
void (*irq_cpu_offline)(struct irq_data *data);
@@ -543,6 +548,7 @@ struct irq_chip {
* IRQCHIP_SUPPORTS_LEVEL_MSI Chip can provide two doorbells for Level MSIs
* IRQCHIP_SUPPORTS_NMI: Chip can deliver NMIs, only for root irqchips
* IRQCHIP_AFFINITY_PRE_STARTUP: Default affinity update before startup
+ * IRQCHIP_PIPELINE_SAFE: Chip can work in pipelined mode
*/
enum {
IRQCHIP_SET_TYPE_MASKED = (1 << 0),
@@ -555,6 +561,7 @@ enum {
IRQCHIP_SUPPORTS_LEVEL_MSI = (1 << 7),
IRQCHIP_SUPPORTS_NMI = (1 << 8),
IRQCHIP_AFFINITY_PRE_STARTUP = (1 << 10),
+ IRQCHIP_PIPELINE_SAFE = (1 << 11),
};
#include <linux/irqdesc.h>
@@ -651,6 +658,11 @@ extern void irq_chip_mask_parent(struct irq_data *data);
extern void irq_chip_mask_ack_parent(struct irq_data *data);
extern void irq_chip_unmask_parent(struct irq_data *data);
extern void irq_chip_eoi_parent(struct irq_data *data);
+#ifdef CONFIG_IPIPE
+extern void irq_chip_hold_parent(struct irq_data *data);
+extern void irq_chip_release_parent(struct irq_data *data);
+#endif
+
extern int irq_chip_set_affinity_parent(struct irq_data *data,
const struct cpumask *dest,
bool force);
@@ -777,7 +789,14 @@ extern int irq_set_irq_type(unsigned int irq, unsigned int type);
extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry);
extern int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset,
struct msi_desc *entry);
-extern struct irq_data *irq_get_irq_data(unsigned int irq);
+
+static inline __attribute__((const)) struct irq_data *
+irq_get_irq_data(unsigned int irq)
+{
+ struct irq_desc *desc = irq_to_desc(irq);
+
+ return desc ? &desc->irq_data : NULL;
+}
static inline struct irq_chip *irq_get_chip(unsigned int irq)
{
@@ -1020,7 +1039,11 @@ struct irq_chip_type {
* different flow mechanisms (level/edge) for it.
*/
struct irq_chip_generic {
+#ifdef CONFIG_IPIPE
+ ipipe_spinlock_t lock;
+#else
raw_spinlock_t lock;
+#endif
void __iomem *reg_base;
u32 (*reg_readl)(void __iomem *addr);
void (*reg_writel)(u32 val, void __iomem *addr);
@@ -1148,18 +1171,28 @@ static inline struct irq_chip_type *irq_data_get_chip_type(struct irq_data *d)
#define IRQ_MSK(n) (u32)((n) < 32 ? ((1 << (n)) - 1) : UINT_MAX)
#ifdef CONFIG_SMP
-static inline void irq_gc_lock(struct irq_chip_generic *gc)
+static inline unsigned long irq_gc_lock(struct irq_chip_generic *gc)
{
- raw_spin_lock(&gc->lock);
+ unsigned long flags = 0;
+ raw_spin_lock_irqsave_cond(&gc->lock, flags);
+ return flags;
}
-static inline void irq_gc_unlock(struct irq_chip_generic *gc)
+static inline void
+irq_gc_unlock(struct irq_chip_generic *gc, unsigned long flags)
{
- raw_spin_unlock(&gc->lock);
+ raw_spin_unlock_irqrestore_cond(&gc->lock, flags);
}
#else
-static inline void irq_gc_lock(struct irq_chip_generic *gc) { }
-static inline void irq_gc_unlock(struct irq_chip_generic *gc) { }
+static inline unsigned long irq_gc_lock(struct irq_chip_generic *gc)
+{
+ return hard_cond_local_irq_save();
+}
+static inline void
+irq_gc_unlock(struct irq_chip_generic *gc, unsigned long flags)
+{
+ hard_cond_local_irq_restore(flags);
+}
#endif
/*
diff --git a/include/linux/irqchip/arm-gic-common.h b/include/linux/irqchip/arm-gic-common.h
index b9850f5f1906..0a588a04ccc1 100644
--- a/include/linux/irqchip/arm-gic-common.h
+++ b/include/linux/irqchip/arm-gic-common.h
@@ -10,7 +10,12 @@
#include <linux/types.h>
#include <linux/ioport.h>
+#ifndef CONFIG_IPIPE
#define GICD_INT_DEF_PRI 0xa0
+#else
+#define GICD_INT_DEF_PRI 0x10
+#endif
+
#define GICD_INT_DEF_PRI_X4 ((GICD_INT_DEF_PRI << 24) |\
(GICD_INT_DEF_PRI << 16) |\
(GICD_INT_DEF_PRI << 8) |\
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index d6e2ab538ef2..089ccb387762 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -57,6 +57,10 @@ struct irq_desc {
struct irq_common_data irq_common_data;
struct irq_data irq_data;
unsigned int __percpu *kstat_irqs;
+#ifdef CONFIG_IPIPE
+ void (*ipipe_ack)(struct irq_desc *desc);
+ void (*ipipe_end)(struct irq_desc *desc);
+#endif /* CONFIG_IPIPE */
irq_flow_handler_t handle_irq;
#ifdef CONFIG_IRQ_PREFLOW_FASTEOI
irq_preflow_handler_t preflow_handler;
@@ -186,6 +190,10 @@ static inline int irq_desc_has_action(struct irq_desc *desc)
return desc->action != NULL;
}
+irq_flow_handler_t
+__ipipe_setup_irq_desc(struct irq_desc *desc, irq_flow_handler_t handle,
+ int is_chained);
+
static inline int irq_has_action(unsigned int irq)
{
return irq_desc_has_action(irq_to_desc(irq));
@@ -206,7 +214,7 @@ static inline void irq_set_handler_locked(struct irq_data *data,
{
struct irq_desc *desc = irq_data_to_desc(data);
- desc->handle_irq = handler;
+ desc->handle_irq = __ipipe_setup_irq_desc(desc, handler, 0);
}
/**
@@ -227,7 +235,7 @@ irq_set_chip_handler_name_locked(struct irq_data *data, struct irq_chip *chip,
{
struct irq_desc *desc = irq_data_to_desc(data);
- desc->handle_irq = handler;
+ desc->handle_irq = __ipipe_setup_irq_desc(desc, handler, 0);
desc->name = name;
data->chip = chip;
}
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..d640c584c3d8 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -148,6 +148,18 @@ do { \
#endif /* CONFIG_TRACE_IRQFLAGS */
+#ifdef CONFIG_IPIPE
+#define local_irq_enable_full() local_irq_enable()
+#define local_irq_disable_full() \
+ do { \
+ local_irq_disable(); \
+ hard_local_irq_disable(); \
+ } while (0)
+#else
+#define local_irq_enable_full() local_irq_enable()
+#define local_irq_disable_full() local_irq_disable()
+#endif
+
#define local_save_flags(flags) raw_local_save_flags(flags)
/*
diff --git a/include/linux/irqnr.h b/include/linux/irqnr.h
index 3496baa0b07f..c731f1874042 100644
--- a/include/linux/irqnr.h
+++ b/include/linux/irqnr.h
@@ -6,7 +6,11 @@
extern int nr_irqs;
+#if !defined(CONFIG_IPIPE) || defined(CONFIG_SPARSE_IRQ)
extern struct irq_desc *irq_to_desc(unsigned int irq);
+#else
+#define irq_to_desc(irq) ({ ipipe_virtual_irq_p(irq) ? NULL : &irq_desc[irq]; })
+#endif
unsigned int irq_get_next_irq(unsigned int offset);
# define for_each_irq_desc(irq, desc) \
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 1fdb251947ed..90c3ad0bf6bc 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -15,6 +15,7 @@
#include <linux/printk.h>
#include <linux/build_bug.h>
#include <asm/byteorder.h>
+#include <asm-generic/ipipe.h>
#include <asm/div64.h>
#include <uapi/linux/kernel.h>
#include <asm/div64.h>
@@ -203,9 +204,12 @@ struct user;
#ifdef CONFIG_PREEMPT_VOLUNTARY
extern int _cond_resched(void);
-# define might_resched() _cond_resched()
+# define might_resched() do { \
+ ipipe_root_only(); \
+ _cond_resched(); \
+ } while (0)
#else
-# define might_resched() do { } while (0)
+# define might_resched() ipipe_root_only()
#endif
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ee7d57478a45..60abc26be11b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -266,6 +266,10 @@ struct kvm_vcpu {
#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier preempt_notifier;
#endif
+#ifdef CONFIG_IPIPE
+ struct ipipe_vm_notifier ipipe_notifier;
+ bool ipipe_put_vcpu;
+#endif
int cpu;
int vcpu_id; /* id given by userspace at creation */
int vcpu_idx; /* index in kvm->vcpus array */
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index bbb68dba37cc..ee3c9f08c57e 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -252,7 +252,28 @@ do { \
#endif /* CONFIG_PREEMPT_COUNT */
-#ifdef MODULE
+#ifdef CONFIG_IPIPE
+#define hard_preempt_disable() \
+ ({ \
+ unsigned long __flags__; \
+ __flags__ = hard_local_irq_save(); \
+ if (__ipipe_root_p) \
+ preempt_disable(); \
+ __flags__; \
+ })
+
+#define hard_preempt_enable(__flags__) \
+ do { \
+ if (__ipipe_root_p) { \
+ preempt_enable_no_resched(); \
+ hard_local_irq_restore(__flags__); \
+ if (!hard_irqs_disabled_flags(__flags__)) \
+ preempt_check_resched(); \
+ } else \
+ hard_local_irq_restore(__flags__); \
+ } while (0)
+
+#elif defined(MODULE)
/*
* Modules have no business playing preemption tricks.
*/
@@ -260,7 +281,7 @@ do { \
#undef preempt_enable_no_resched
#undef preempt_enable_no_resched_notrace
#undef preempt_check_resched
-#endif
+#endif /* !IPIPE && MODULE */
#define preempt_set_need_resched() \
do { \
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 3b5cb66d8bc1..f5fca32df941 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -158,6 +158,17 @@ static inline void printk_nmi_direct_enter(void) { }
static inline void printk_nmi_direct_exit(void) { }
#endif /* PRINTK_NMI */
+#ifdef CONFIG_RAW_PRINTK
+void raw_vprintk(const char *fmt, va_list ap);
+asmlinkage __printf(1, 2)
+void raw_printk(const char *fmt, ...);
+#else
+static inline __cold
+void raw_vprintk(const char *s, va_list ap) { }
+static inline __printf(1, 2) __cold
+void raw_printk(const char *s, ...) { }
+#endif
+
#ifdef CONFIG_PRINTK
asmlinkage __printf(5, 0)
int vprintk_emit(int facility, int level,
diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index 3dcd617e65ae..ac48e29bbef0 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -67,8 +67,8 @@ do { \
#define read_trylock(lock) __cond_lock(lock, _raw_read_trylock(lock))
#define write_trylock(lock) __cond_lock(lock, _raw_write_trylock(lock))
-#define write_lock(lock) _raw_write_lock(lock)
-#define read_lock(lock) _raw_read_lock(lock)
+#define write_lock(lock) PICK_RWOP(_write_lock, lock)
+#define read_lock(lock) PICK_RWOP(_read_lock, lock)
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
@@ -102,8 +102,8 @@ do { \
#define read_lock_bh(lock) _raw_read_lock_bh(lock)
#define write_lock_irq(lock) _raw_write_lock_irq(lock)
#define write_lock_bh(lock) _raw_write_lock_bh(lock)
-#define read_unlock(lock) _raw_read_unlock(lock)
-#define write_unlock(lock) _raw_write_unlock(lock)
+#define read_unlock(lock) PICK_RWOP(_read_unlock, lock)
+#define write_unlock(lock) PICK_RWOP(_write_unlock, lock)
#define read_unlock_irq(lock) _raw_read_unlock_irq(lock)
#define write_unlock_irq(lock) _raw_write_unlock_irq(lock)
diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index 86ebb4bf9c6e..c1ed96fa0726 100644
--- a/include/linux/rwlock_api_smp.h
+++ b/include/linux/rwlock_api_smp.h
@@ -141,7 +141,9 @@ static inline int __raw_write_trylock(rwlock_t *lock)
* even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
* not re-enabled during lock-acquire (which the preempt-spin-ops do):
*/
-#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)
+#if !defined(CONFIG_GENERIC_LOCKBREAK) || \
+ defined(CONFIG_DEBUG_LOCK_ALLOC) || \
+ defined(CONFIG_IPIPE)
static inline void __raw_read_lock(rwlock_t *lock)
{
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d0e639497b10..09db655330bc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -89,7 +89,9 @@ struct task_group;
#define TASK_WAKING 0x0200
#define TASK_NOLOAD 0x0400
#define TASK_NEW 0x0800
-#define TASK_STATE_MAX 0x1000
+#define TASK_HARDENING 0x1000
+#define TASK_NOWAKEUP 0x2000
+#define TASK_STATE_MAX 0x4000
/* Convenience macros for the sake of set_current_state: */
#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index dfd82eab2902..5fb7c7e364fb 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -74,6 +74,7 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */
#define MMF_MULTIPROCESS 27 /* mm is shared between processes */
#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
+#define MMF_VM_PINNED 31 /* ondemand load up and COW disabled */
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
MMF_DISABLE_THP_MASK)
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 031ce8617df8..0732eca3b6d4 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -91,10 +91,12 @@
# include <linux/spinlock_up.h>
#endif
+#include <linux/ipipe_lock.h>
+
#ifdef CONFIG_DEBUG_SPINLOCK
extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
struct lock_class_key *key);
-# define raw_spin_lock_init(lock) \
+# define __real_raw_spin_lock_init(lock) \
do { \
static struct lock_class_key __key; \
\
@@ -102,11 +104,14 @@ do { \
} while (0)
#else
-# define raw_spin_lock_init(lock) \
+# define __real_raw_spin_lock_init(lock) \
do { *(lock) = __RAW_SPIN_LOCK_UNLOCKED(lock); } while (0)
#endif
+#define raw_spin_lock_init(lock) PICK_SPINOP(_lock_init, lock)
-#define raw_spin_is_locked(lock) arch_spin_is_locked(&(lock)->raw_lock)
+#define __real_raw_spin_is_locked(lock) \
+ arch_spin_is_locked(&(lock)->raw_lock)
+#define raw_spin_is_locked(lock) PICK_SPINOP_RET(_is_locked, lock, int)
#ifdef arch_spin_is_contended
#define raw_spin_is_contended(lock) arch_spin_is_contended(&(lock)->raw_lock)
@@ -218,9 +223,11 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
* various methods are defined as nops in the case they are not
* required.
*/
-#define raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock))
+#define __real_raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock))
+#define raw_spin_trylock(lock) PICK_SPINOP_RET(_trylock, lock, int)
-#define raw_spin_lock(lock) _raw_spin_lock(lock)
+#define __real_raw_spin_lock(lock) _raw_spin_lock(lock)
+#define raw_spin_lock(lock) PICK_SPINOP(_lock, lock)
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define raw_spin_lock_nested(lock, subclass) \
@@ -244,7 +251,7 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-#define raw_spin_lock_irqsave(lock, flags) \
+#define __real_raw_spin_lock_irqsave(lock, flags) \
do { \
typecheck(unsigned long, flags); \
flags = _raw_spin_lock_irqsave(lock); \
@@ -266,7 +273,7 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
#else
-#define raw_spin_lock_irqsave(lock, flags) \
+#define __real_raw_spin_lock_irqsave(lock, flags) \
do { \
typecheck(unsigned long, flags); \
_raw_spin_lock_irqsave(lock, flags); \
@@ -277,34 +284,46 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
#endif
-#define raw_spin_lock_irq(lock) _raw_spin_lock_irq(lock)
+#define raw_spin_lock_irqsave(lock, flags) \
+ PICK_SPINLOCK_IRQSAVE(lock, flags)
+
+#define __real_raw_spin_lock_irq(lock) _raw_spin_lock_irq(lock)
+#define raw_spin_lock_irq(lock) PICK_SPINOP(_lock_irq, lock)
#define raw_spin_lock_bh(lock) _raw_spin_lock_bh(lock)
-#define raw_spin_unlock(lock) _raw_spin_unlock(lock)
-#define raw_spin_unlock_irq(lock) _raw_spin_unlock_irq(lock)
+#define __real_raw_spin_unlock(lock) _raw_spin_unlock(lock)
+#define raw_spin_unlock(lock) PICK_SPINOP(_unlock, lock)
+#define __real_raw_spin_unlock_irq(lock) _raw_spin_unlock_irq(lock)
+#define raw_spin_unlock_irq(lock) PICK_SPINOP(_unlock_irq, lock)
-#define raw_spin_unlock_irqrestore(lock, flags) \
+#define __real_raw_spin_unlock_irqrestore(lock, flags) \
do { \
typecheck(unsigned long, flags); \
_raw_spin_unlock_irqrestore(lock, flags); \
} while (0)
+#define raw_spin_unlock_irqrestore(lock, flags) \
+ PICK_SPINUNLOCK_IRQRESTORE(lock, flags)
+
#define raw_spin_unlock_bh(lock) _raw_spin_unlock_bh(lock)
#define raw_spin_trylock_bh(lock) \
__cond_lock(lock, _raw_spin_trylock_bh(lock))
-#define raw_spin_trylock_irq(lock) \
+#define __real_raw_spin_trylock_irq(lock) \
({ \
local_irq_disable(); \
- raw_spin_trylock(lock) ? \
+ __real_raw_spin_trylock(lock) ? \
1 : ({ local_irq_enable(); 0; }); \
})
+#define raw_spin_trylock_irq(lock) PICK_SPINTRYLOCK_IRQ(lock)
-#define raw_spin_trylock_irqsave(lock, flags) \
+#define __real_raw_spin_trylock_irqsave(lock, flags) \
({ \
local_irq_save(flags); \
raw_spin_trylock(lock) ? \
1 : ({ local_irq_restore(flags); 0; }); \
})
+#define raw_spin_trylock_irqsave(lock, flags) \
+ PICK_SPINTRYLOCK_IRQSAVE(lock, flags)
/* Include rwlock functions */
#include <linux/rwlock.h>
@@ -329,24 +348,17 @@ static __always_inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
#define spin_lock_init(_lock) \
do { \
- spinlock_check(_lock); \
- raw_spin_lock_init(&(_lock)->rlock); \
+ raw_spin_lock_init(_lock); \
} while (0)
-static __always_inline void spin_lock(spinlock_t *lock)
-{
- raw_spin_lock(&lock->rlock);
-}
+#define spin_lock(lock) raw_spin_lock(lock)
static __always_inline void spin_lock_bh(spinlock_t *lock)
{
raw_spin_lock_bh(&lock->rlock);
}
-static __always_inline int spin_trylock(spinlock_t *lock)
-{
- return raw_spin_trylock(&lock->rlock);
-}
+#define spin_trylock(lock) raw_spin_trylock(lock)
#define spin_lock_nested(lock, subclass) \
do { \
@@ -358,14 +370,11 @@ do { \
raw_spin_lock_nest_lock(spinlock_check(lock), nest_lock); \
} while (0)
-static __always_inline void spin_lock_irq(spinlock_t *lock)
-{
- raw_spin_lock_irq(&lock->rlock);
-}
+#define spin_lock_irq(lock) raw_spin_lock_irq(lock)
#define spin_lock_irqsave(lock, flags) \
do { \
- raw_spin_lock_irqsave(spinlock_check(lock), flags); \
+ raw_spin_lock_irqsave(lock, flags); \
} while (0)
#define spin_lock_irqsave_nested(lock, flags, subclass) \
@@ -373,39 +382,28 @@ do { \
raw_spin_lock_irqsave_nested(spinlock_check(lock), flags, subclass); \
} while (0)
-static __always_inline void spin_unlock(spinlock_t *lock)
-{
- raw_spin_unlock(&lock->rlock);
-}
+#define spin_unlock(lock) raw_spin_unlock(lock)
static __always_inline void spin_unlock_bh(spinlock_t *lock)
{
raw_spin_unlock_bh(&lock->rlock);
}
-static __always_inline void spin_unlock_irq(spinlock_t *lock)
-{
- raw_spin_unlock_irq(&lock->rlock);
-}
+#define spin_unlock_irq(lock) raw_spin_unlock_irq(lock)
-static __always_inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
-{
- raw_spin_unlock_irqrestore(&lock->rlock, flags);
-}
+#define spin_unlock_irqrestore(lock, flags) \
+ raw_spin_unlock_irqrestore(lock, flags)
static __always_inline int spin_trylock_bh(spinlock_t *lock)
{
return raw_spin_trylock_bh(&lock->rlock);
}
-static __always_inline int spin_trylock_irq(spinlock_t *lock)
-{
- return raw_spin_trylock_irq(&lock->rlock);
-}
+#define spin_trylock_irq(lock) raw_spin_trylock_irq(lock)
#define spin_trylock_irqsave(lock, flags) \
({ \
- raw_spin_trylock_irqsave(spinlock_check(lock), flags); \
+ raw_spin_trylock_irqsave(lock, flags); \
})
/**
diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h
index b762eaba4cdf..5098b836e866 100644
--- a/include/linux/spinlock_api_smp.h
+++ b/include/linux/spinlock_api_smp.h
@@ -99,7 +99,9 @@ static inline int __raw_spin_trylock(raw_spinlock_t *lock)
* even on CONFIG_PREEMPTION, because lockdep assumes that interrupts are
* not re-enabled during lock-acquire (which the preempt-spin-ops do):
*/
-#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)
+#if !defined(CONFIG_GENERIC_LOCKBREAK) || \
+ defined(CONFIG_DEBUG_LOCK_ALLOC) || \
+ defined(CONFIG_IPIPE)
static inline unsigned long __raw_spin_lock_irqsave(raw_spinlock_t *lock)
{
@@ -113,7 +115,7 @@ static inline unsigned long __raw_spin_lock_irqsave(raw_spinlock_t *lock)
* do_raw_spin_lock_flags() code, because lockdep assumes
* that interrupts are not re-enabled during lock-acquire:
*/
-#ifdef CONFIG_LOCKDEP
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_IPIPE)
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
#else
do_raw_spin_lock_flags(lock, &flags);
diff --git a/include/linux/spinlock_up.h b/include/linux/spinlock_up.h
index 0ac9112c1bbe..b8c6c6d477d6 100644
--- a/include/linux/spinlock_up.h
+++ b/include/linux/spinlock_up.h
@@ -48,16 +48,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
lock->slock = 1;
}
-/*
- * Read-write spinlocks. No debug version.
- */
-#define arch_read_lock(lock) do { barrier(); (void)(lock); } while (0)
-#define arch_write_lock(lock) do { barrier(); (void)(lock); } while (0)
-#define arch_read_trylock(lock) ({ barrier(); (void)(lock); 1; })
-#define arch_write_trylock(lock) ({ barrier(); (void)(lock); 1; })
-#define arch_read_unlock(lock) do { barrier(); (void)(lock); } while (0)
-#define arch_write_unlock(lock) do { barrier(); (void)(lock); } while (0)
-
#else /* DEBUG_SPINLOCK */
#define arch_spin_is_locked(lock) ((void)(lock), 0)
/* for sched/core.c and kernel_lock.c: */
@@ -67,6 +57,13 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
# define arch_spin_trylock(lock) ({ barrier(); (void)(lock); 1; })
#endif /* DEBUG_SPINLOCK */
+#define arch_read_lock(lock) do { barrier(); (void)(lock); } while (0)
+#define arch_write_lock(lock) do { barrier(); (void)(lock); } while (0)
+#define arch_read_trylock(lock) ({ barrier(); (void)(lock); 1; })
+#define arch_write_trylock(lock) ({ barrier(); (void)(lock); 1; })
+#define arch_read_unlock(lock) do { barrier(); (void)(lock); } while (0)
+#define arch_write_unlock(lock) do { barrier(); (void)(lock); } while (0)
+
#define arch_spin_is_contended(lock) (((void)(lock), 0))
#endif /* __LINUX_SPINLOCK_UP_H */
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 69998fc5ffe9..fa4d05ae3fa2 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -139,13 +139,17 @@ int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data,
const struct cpumask *cpus);
#else /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */
+#include <linux/interrupt.h>
+
static __always_inline int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data,
const struct cpumask *cpus)
{
unsigned long flags;
int ret;
local_irq_save(flags);
+ hard_irq_disable();
ret = fn(data);
+ hard_irq_enable();
local_irq_restore(flags);
return ret;
}
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index bee7573f40f5..cfd8cd3137e9 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -19,6 +19,7 @@
#include <linux/cpumask.h>
#include <linux/rcupdate.h>
#include <linux/tracepoint-defs.h>
+#include <linux/ipipe.h>
struct module;
struct tracepoint;
@@ -220,7 +221,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
__DO_TRACE(&__tracepoint_##name, \
TP_PROTO(data_proto), \
TP_ARGS(data_args), \
- TP_CONDITION(cond), 1); \
+ TP_CONDITION(cond), ipipe_root_p); \
}
#else
#define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args)
@@ -246,7 +247,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
TP_PROTO(data_proto), \
TP_ARGS(data_args), \
TP_CONDITION(cond), 0); \
- if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
+ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond) && \
+ !IS_ENABLED(CONFIG_IPIPE)) { \
WARN_ON_ONCE(!rcu_is_watching()); \
} \
} \
diff --git a/init/Kconfig b/init/Kconfig
index f641518f4ac5..c7f97ca28dc6 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1410,6 +1410,18 @@ config PRINTK_NMI
depends on PRINTK
depends on HAVE_NMI
+config RAW_PRINTK
+ bool "Enable support for raw printk"
+ default n
+ help
+ This option enables a printk variant called raw_printk() for
+ writing all output unmodified to a raw console channel
+ immediately, without any header or preparation whatsoever,
+ usable from any context.
+
+ Unlike early_printk() console devices, raw_printk() devices
+ can live past the boot sequence.
+
config BUG
bool "BUG() support" if EXPERT
default y
diff --git a/init/main.c b/init/main.c
index a17a111d9336..ee0e9eb1b0c8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -47,6 +47,7 @@
#include <linux/cpuset.h>
#include <linux/cgroup.h>
#include <linux/efi.h>
+#include <linux/ipipe.h>
#include <linux/tick.h>
#include <linux/sched/isolation.h>
#include <linux/interrupt.h>
@@ -585,7 +586,7 @@ asmlinkage __visible void __init start_kernel(void)
cgroup_init_early();
- local_irq_disable();
+ hard_local_irq_disable();
early_boot_irqs_disabled = true;
/*
@@ -625,6 +626,7 @@ asmlinkage __visible void __init start_kernel(void)
setup_log_buf(0);
vfs_caches_init_early();
sort_main_extable();
+ __ipipe_init_early();
trap_init();
mm_init();
@@ -681,6 +683,11 @@ asmlinkage __visible void __init start_kernel(void)
softirq_init();
timekeeping_init();
time_init();
+ /*
+ * We need to wait for the interrupt and time subsystems to be
+ * initialized before enabling the pipeline.
+ */
+ __ipipe_init();
/*
* For best initial stack canary entropy, prepare it after:
@@ -1026,6 +1033,7 @@ static void __init do_basic_setup(void)
cpuset_init_smp();
driver_init();
init_irq_proc();
+ __ipipe_init_proc();
do_ctors();
usermodehelper_enable();
do_initcalls();
diff --git a/kernel/Makefile b/kernel/Makefile
index d038b0de886e..ad281bcce38c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -88,6 +88,7 @@ obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
obj-$(CONFIG_SECCOMP) += seccomp.o
obj-$(CONFIG_RELAY) += relay.o
+obj-$(CONFIG_IPIPE) += ipipe/
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index be01a4d627c9..6fc82f6d3c33 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -114,7 +114,7 @@ void context_tracking_enter(enum ctx_state state)
* helpers are enough to protect RCU uses inside the exception. So
* just return immediately if we detect we are in an IRQ.
*/
- if (in_interrupt())
+ if (!ipipe_root_p || in_interrupt())
return;
local_irq_save(flags);
@@ -170,7 +170,7 @@ void context_tracking_exit(enum ctx_state state)
{
unsigned long flags;
- if (in_interrupt())
+ if (!ipipe_root_p || in_interrupt())
return;
local_irq_save(flags);
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index f88611fadb19..849da5b23e35 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -113,8 +113,8 @@ static struct kgdb_bkpt kgdb_break[KGDB_MAX_BREAKPOINTS] = {
*/
atomic_t kgdb_active = ATOMIC_INIT(-1);
EXPORT_SYMBOL_GPL(kgdb_active);
-static DEFINE_RAW_SPINLOCK(dbg_master_lock);
-static DEFINE_RAW_SPINLOCK(dbg_slave_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(dbg_master_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(dbg_slave_lock);
/*
* We use NR_CPUs not PERCPU, in case kgdb is used to debug early
@@ -512,7 +512,9 @@ static int kgdb_reenter_check(struct kgdb_state *ks)
static void dbg_touch_watchdogs(void)
{
touch_softlockup_watchdog_sync();
+#ifndef CONFIG_IPIPE
clocksource_touch_watchdog();
+#endif
rcu_cpu_stall_reset();
}
@@ -544,7 +546,7 @@ acquirelock:
* Interrupts will be restored by the 'trap return' code, except when
* single stepping.
*/
- local_irq_save(flags);
+ flags = hard_local_irq_save();
cpu = ks->cpu;
kgdb_info[cpu].debuggerinfo = regs;
@@ -595,7 +597,7 @@ return_normal:
smp_mb__before_atomic();
atomic_dec(&slaves_in_kgdb);
dbg_touch_watchdogs();
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
rcu_read_unlock();
return 0;
}
@@ -614,7 +616,7 @@ return_normal:
atomic_set(&kgdb_active, -1);
raw_spin_unlock(&dbg_master_lock);
dbg_touch_watchdogs();
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
rcu_read_unlock();
goto acquirelock;
@@ -761,7 +763,7 @@ kgdb_restore:
atomic_set(&kgdb_active, -1);
raw_spin_unlock(&dbg_master_lock);
dbg_touch_watchdogs();
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
rcu_read_unlock();
return kgdb_info[cpu].ret_state;
@@ -880,9 +882,9 @@ static void kgdb_console_write(struct console *co, const char *s,
if (!kgdb_connected || atomic_read(&kgdb_active) != -1 || dbg_kdb_mode)
return;
- local_irq_save(flags);
+ flags = hard_local_irq_save();
gdbstub_msg_write(s, count);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
static struct console kgdbcons = {
diff --git a/kernel/exit.c b/kernel/exit.c
index 563bdaa76694..0649eeb1664d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -57,6 +57,7 @@
#include <trace/events/sched.h>
#include <linux/hw_breakpoint.h>
#include <linux/oom.h>
+#include <linux/ipipe.h>
#include <linux/writeback.h>
#include <linux/shm.h>
#include <linux/kcov.h>
@@ -812,6 +813,7 @@ void __noreturn do_exit(long code)
}
exit_signals(tsk); /* sets PF_EXITING */
+ __ipipe_report_exit(tsk);
/* sync mm's RSS info before statistics gathering */
if (tsk->mm)
diff --git a/kernel/fork.c b/kernel/fork.c
index 5b4a19682207..c0a8ea55893b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -57,6 +57,7 @@
#include <linux/futex.h>
#include <linux/compat.h>
#include <linux/kthread.h>
+#include <linux/ipipe.h>
#include <linux/task_io_accounting_ops.h>
#include <linux/rcupdate.h>
#include <linux/ptrace.h>
@@ -93,6 +94,7 @@
#include <linux/kcov.h>
#include <linux/livepatch.h>
#include <linux/thread_info.h>
+#include <ipipe/thread_info.h>
#include <linux/stackleak.h>
#include <asm/pgtable.h>
@@ -904,6 +906,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
#endif
setup_thread_stack(tsk, orig);
+#ifdef CONFIG_IPIPE
+ __ipipe_init_threadflags(task_thread_info(tsk));
+ __ipipe_init_threadinfo(&task_thread_info(tsk)->ipipe_data);
+#endif
clear_user_return_notifier(tsk);
clear_tsk_need_resched(tsk);
set_task_stack_end_magic(tsk);
@@ -1077,6 +1083,7 @@ static inline void __mmput(struct mm_struct *mm)
exit_aio(mm);
ksm_exit(mm);
khugepaged_exit(mm); /* must run before exit_mmap */
+ __ipipe_report_cleanup(mm);
exit_mmap(mm);
mm_put_huge_zero_page(mm);
set_mm_exe_file(mm, NULL);
diff --git a/kernel/ipipe/Kconfig b/kernel/ipipe/Kconfig
new file mode 100644
index 000000000000..d17edb89f1f5
--- /dev/null
+++ b/kernel/ipipe/Kconfig
@@ -0,0 +1,47 @@
+
+config HAVE_IPIPE_SUPPORT
+ depends on GENERIC_CLOCKEVENTS
+ bool
+
+config IPIPE
+ bool "Interrupt pipeline"
+ depends on HAVE_IPIPE_SUPPORT
+ default n
+ ---help---
+ Activate this option if you want the interrupt pipeline to be
+ compiled in.
+
+config IPIPE_CORE
+ def_bool y if IPIPE
+
+config IPIPE_WANT_PTE_PINNING
+ bool
+
+config IPIPE_CORE_APIREV
+ int
+ depends on IPIPE
+ default 2
+ ---help---
+ The API revision level we implement.
+
+config IPIPE_WANT_APIREV_2
+ bool
+
+config IPIPE_TARGET_APIREV
+ int
+ depends on IPIPE
+ default IPIPE_CORE_APIREV
+ ---help---
+ The API revision level the we want (must be <=
+ IPIPE_CORE_APIREV).
+
+config IPIPE_HAVE_HOSTRT
+ bool
+
+config IPIPE_HAVE_EAGER_FPU
+ bool
+
+if IPIPE && ARM && RAW_PRINTK && !DEBUG_LL
+comment "CAUTION: DEBUG_LL must be selected, and properly configured for"
+comment "RAW_PRINTK to work. Otherwise, you will get no output on raw_printk()"
+endif
diff --git a/kernel/ipipe/Kconfig.debug b/kernel/ipipe/Kconfig.debug
new file mode 100644
index 000000000000..d1894cf62d54
--- /dev/null
+++ b/kernel/ipipe/Kconfig.debug
@@ -0,0 +1,100 @@
+config IPIPE_DEBUG
+ bool "I-pipe debugging"
+ depends on IPIPE
+ select RAW_PRINTK
+
+config IPIPE_DEBUG_CONTEXT
+ bool "Check for illicit cross-domain calls"
+ depends on IPIPE_DEBUG
+ default y
+ ---help---
+ Enable this feature to arm checkpoints in the kernel that
+ verify the correct invocation context. On entry of critical
+ Linux services a warning is issued if the caller is not
+ running over the root domain.
+
+config IPIPE_DEBUG_INTERNAL
+ bool "Enable internal debug checks"
+ depends on IPIPE_DEBUG
+ default y
+ ---help---
+ When this feature is enabled, I-pipe will perform internal
+ consistency checks of its subsystems, e.g. on per-cpu variable
+ access.
+
+config HAVE_IPIPE_TRACER_SUPPORT
+ bool
+
+config IPIPE_TRACE
+ bool "Latency tracing"
+ depends on HAVE_IPIPE_TRACER_SUPPORT
+ depends on IPIPE_DEBUG
+ select CONFIG_FTRACE
+ select CONFIG_FUNCTION_TRACER
+ select KALLSYMS
+ select PROC_FS
+ ---help---
+ Activate this option if you want to use per-function tracing of
+ the kernel. The tracer will collect data via instrumentation
+ features like the one below or with the help of explicite calls
+ of ipipe_trace_xxx(). See include/linux/ipipe_trace.h for the
+ in-kernel tracing API. The collected data and runtime control
+ is available via /proc/ipipe/trace/*.
+
+if IPIPE_TRACE
+
+config IPIPE_TRACE_ENABLE
+ bool "Enable tracing on boot"
+ default y
+ ---help---
+ Disable this option if you want to arm the tracer after booting
+ manually ("echo 1 > /proc/ipipe/tracer/enable"). This can reduce
+ boot time on slow embedded devices due to the tracer overhead.
+
+config IPIPE_TRACE_MCOUNT
+ bool "Instrument function entries"
+ default y
+ select FTRACE
+ select FUNCTION_TRACER
+ ---help---
+ When enabled, records every kernel function entry in the tracer
+ log. While this slows down the system noticeably, it provides
+ the highest level of information about the flow of events.
+ However, it can be switch off in order to record only explicit
+ I-pipe trace points.
+
+config IPIPE_TRACE_IRQSOFF
+ bool "Trace IRQs-off times"
+ default y
+ ---help---
+ Activate this option if I-pipe shall trace the longest path
+ with hard-IRQs switched off.
+
+config IPIPE_TRACE_SHIFT
+ int "Depth of trace log (14 => 16Kpoints, 15 => 32Kpoints)"
+ range 10 18
+ default 14
+ ---help---
+ The number of trace points to hold tracing data for each
+ trace path, as a power of 2.
+
+config IPIPE_TRACE_VMALLOC
+ bool "Use vmalloc'ed trace buffer"
+ default y if EMBEDDED
+ ---help---
+ Instead of reserving static kernel data, the required buffer
+ is allocated via vmalloc during boot-up when this option is
+ enabled. This can help to start systems that are low on memory,
+ but it slightly degrades overall performance. Try this option
+ when a traced kernel hangs unexpectedly at boot time.
+
+config IPIPE_TRACE_PANIC
+ bool "Enable panic back traces"
+ default y
+ ---help---
+ Provides services to freeze and dump a back trace on panic
+ situations. This is used on IPIPE_DEBUG_CONTEXT exceptions
+ as well as ordinary kernel oopses. You can control the number
+ of printed back trace points via /proc/ipipe/trace.
+
+endif
diff --git a/kernel/ipipe/Makefile b/kernel/ipipe/Makefile
new file mode 100644
index 000000000000..73755150634f
--- /dev/null
+++ b/kernel/ipipe/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_IPIPE) += core.o timer.o
+obj-$(CONFIG_IPIPE_TRACE) += tracer.o
diff --git a/kernel/ipipe/core.c b/kernel/ipipe/core.c
new file mode 100644
index 000000000000..485db973b81b
--- /dev/null
+++ b/kernel/ipipe/core.c
@@ -0,0 +1,2140 @@
+/* -*- linux-c -*-
+ * linux/kernel/ipipe/core.c
+ *
+ * Copyright (C) 2002-2012 Philippe Gerum.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Architecture-independent I-PIPE core support.
+ */
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/sched/debug.h>
+#include <linux/kallsyms.h>
+#include <linux/bitops.h>
+#include <linux/tick.h>
+#include <linux/interrupt.h>
+#include <linux/uaccess.h>
+#include <linux/cpuidle.h>
+#include <linux/sched/idle.h>
+#ifdef CONFIG_PROC_FS
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#endif /* CONFIG_PROC_FS */
+#include <linux/ipipe_trace.h>
+#include <linux/ipipe.h>
+#include <ipipe/setup.h>
+#include <asm/syscall.h>
+#include <asm/unistd.h>
+
+EXPORT_SYMBOL(sys_call_table);
+int (*rtai_fastcall_hook)(struct pt_regs *) = NULL;
+EXPORT_SYMBOL(rtai_fastcall_hook);
+int (*rtai_syscall_hook)(struct pt_regs *) = NULL;
+EXPORT_SYMBOL(rtai_syscall_hook);
+int (*rtai_trap_hook)(int exception, struct pt_regs *) = NULL;
+EXPORT_SYMBOL(rtai_trap_hook);
+int (*rtai_kevent_hook)(int kevent, void *) = NULL;
+EXPORT_SYMBOL(rtai_kevent_hook);
+void (*rtai_migration_hook)(struct task_struct *) = NULL;
+EXPORT_SYMBOL(rtai_migration_hook);
+void (*dispatch_irq_head)(unsigned int) = NULL;
+EXPORT_SYMBOL(dispatch_irq_head);
+
+struct ipipe_domain ipipe_root;
+EXPORT_SYMBOL_GPL(ipipe_root);
+
+struct ipipe_domain *ipipe_head_domain = &ipipe_root;
+EXPORT_SYMBOL_GPL(ipipe_head_domain);
+
+#ifdef CONFIG_SMP
+static __initdata struct ipipe_percpu_domain_data bootup_context = {
+ .status = IPIPE_STALL_MASK,
+ .domain = &ipipe_root,
+};
+#else
+#define bootup_context ipipe_percpu.root
+#endif /* !CONFIG_SMP */
+
+DEFINE_PER_CPU(struct ipipe_percpu_data, ipipe_percpu) = {
+ .root = {
+ .status = IPIPE_STALL_MASK,
+ .domain = &ipipe_root,
+ },
+ .curr = &bootup_context,
+ .hrtimer_irq = -1,
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+ .context_check = 1,
+#endif
+};
+EXPORT_PER_CPU_SYMBOL(ipipe_percpu);
+
+/* Up to 2k of pending work data per CPU. */
+#define WORKBUF_SIZE 2048
+static DEFINE_PER_CPU_ALIGNED(unsigned char[WORKBUF_SIZE], work_buf);
+static DEFINE_PER_CPU(void *, work_tail);
+static unsigned int __ipipe_work_virq;
+
+static void __ipipe_do_work(unsigned int virq, void *cookie);
+
+#ifdef CONFIG_SMP
+
+#define IPIPE_CRITICAL_TIMEOUT 1000000
+static cpumask_t __ipipe_cpu_sync_map;
+static cpumask_t __ipipe_cpu_lock_map;
+static cpumask_t __ipipe_cpu_pass_map;
+static unsigned long __ipipe_critical_lock;
+static IPIPE_DEFINE_SPINLOCK(__ipipe_cpu_barrier);
+static atomic_t __ipipe_critical_count = ATOMIC_INIT(0);
+static void (*__ipipe_cpu_sync) (void);
+
+#else /* !CONFIG_SMP */
+/*
+ * Create an alias to the unique root status, so that arch-dep code
+ * may get fast access to this percpu variable including from
+ * assembly. A hard-coded assumption is that root.status appears at
+ * offset #0 of the ipipe_percpu struct.
+ */
+extern unsigned long __ipipe_root_status
+__attribute__((alias(__stringify(ipipe_percpu))));
+EXPORT_SYMBOL(__ipipe_root_status);
+
+#endif /* !CONFIG_SMP */
+
+IPIPE_DEFINE_SPINLOCK(__ipipe_lock);
+
+static unsigned long __ipipe_virtual_irq_map;
+
+#ifdef CONFIG_PRINTK
+unsigned int __ipipe_printk_virq;
+int __ipipe_printk_bypass;
+#endif /* CONFIG_PRINTK */
+
+#ifdef CONFIG_PROC_FS
+
+struct proc_dir_entry *ipipe_proc_root;
+
+static int __ipipe_version_info_show(struct seq_file *p, void *data)
+{
+ seq_printf(p, "%d\n", IPIPE_CORE_RELEASE);
+ return 0;
+}
+
+static int __ipipe_version_info_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, __ipipe_version_info_show, NULL);
+}
+
+static const struct file_operations __ipipe_version_proc_ops = {
+ .open = __ipipe_version_info_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int __ipipe_common_info_show(struct seq_file *p, void *data)
+{
+ struct ipipe_domain *ipd = (struct ipipe_domain *)p->private;
+ char handling, lockbit, virtuality;
+ unsigned long ctlbits;
+ unsigned int irq;
+
+ seq_printf(p, " +--- Handled\n");
+ seq_printf(p, " |+-- Locked\n");
+ seq_printf(p, " ||+- Virtual\n");
+ seq_printf(p, " [IRQ] ||| Handler\n");
+
+ mutex_lock(&ipd->mutex);
+
+ for (irq = 0; irq < IPIPE_NR_IRQS; irq++) {
+ ctlbits = ipd->irqs[irq].control;
+ /*
+ * There might be a hole between the last external IRQ
+ * and the first virtual one; skip it.
+ */
+ if (irq >= IPIPE_NR_XIRQS && !ipipe_virtual_irq_p(irq))
+ continue;
+
+ if (ipipe_virtual_irq_p(irq)
+ && !test_bit(irq - IPIPE_VIRQ_BASE, &__ipipe_virtual_irq_map))
+ /* Non-allocated virtual IRQ; skip it. */
+ continue;
+
+ if (ctlbits & IPIPE_HANDLE_MASK)
+ handling = 'H';
+ else
+ handling = '.';
+
+ if (ctlbits & IPIPE_LOCK_MASK)
+ lockbit = 'L';
+ else
+ lockbit = '.';
+
+ if (ipipe_virtual_irq_p(irq))
+ virtuality = 'V';
+ else
+ virtuality = '.';
+
+ if (ctlbits & IPIPE_HANDLE_MASK)
+ seq_printf(p, " %4u: %c%c%c %pf\n",
+ irq, handling, lockbit, virtuality,
+ ipd->irqs[irq].handler);
+ else
+ seq_printf(p, " %4u: %c%c%c\n",
+ irq, handling, lockbit, virtuality);
+ }
+
+ mutex_unlock(&ipd->mutex);
+
+ return 0;
+}
+
+static int __ipipe_common_info_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, __ipipe_common_info_show, PDE_DATA(inode));
+}
+
+static const struct file_operations __ipipe_info_proc_ops = {
+ .owner = THIS_MODULE,
+ .open = __ipipe_common_info_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+void add_domain_proc(struct ipipe_domain *ipd)
+{
+ proc_create_data(ipd->name, 0444, ipipe_proc_root,
+ &__ipipe_info_proc_ops, ipd);
+}
+
+void remove_domain_proc(struct ipipe_domain *ipd)
+{
+ remove_proc_entry(ipd->name, ipipe_proc_root);
+}
+
+void __init __ipipe_init_proc(void)
+{
+ ipipe_proc_root = proc_mkdir("ipipe", NULL);
+ proc_create("version", 0444, ipipe_proc_root,
+ &__ipipe_version_proc_ops);
+ add_domain_proc(ipipe_root_domain);
+
+ __ipipe_init_tracer();
+}
+
+#else
+
+static inline void add_domain_proc(struct ipipe_domain *ipd)
+{
+}
+
+static inline void remove_domain_proc(struct ipipe_domain *ipd)
+{
+}
+
+#endif /* CONFIG_PROC_FS */
+
+static void init_stage(struct ipipe_domain *ipd)
+{
+ memset(&ipd->irqs, 0, sizeof(ipd->irqs));
+ mutex_init(&ipd->mutex);
+ __ipipe_hook_critical_ipi(ipd);
+}
+
+static inline int root_context_offset(void)
+{
+ void root_context_not_at_start_of_ipipe_percpu(void);
+
+ /* ipipe_percpu.root must be found at offset #0. */
+
+ if (offsetof(struct ipipe_percpu_data, root))
+ root_context_not_at_start_of_ipipe_percpu();
+
+ return 0;
+}
+
+#ifdef CONFIG_SMP
+
+static inline void fixup_percpu_data(void)
+{
+ struct ipipe_percpu_data *p;
+ int cpu;
+
+ /*
+ * ipipe_percpu.curr cannot be assigned statically to
+ * &ipipe_percpu.root, due to the dynamic nature of percpu
+ * data. So we make ipipe_percpu.curr refer to a temporary
+ * boot up context in static memory, until we can fixup all
+ * context pointers in this routine, after per-cpu areas have
+ * been eventually set up. The temporary context data is
+ * copied to per_cpu(ipipe_percpu, 0).root in the same move.
+ *
+ * Obviously, this code must run over the boot CPU, before SMP
+ * operations start.
+ */
+ BUG_ON(smp_processor_id() || !irqs_disabled());
+
+ per_cpu(ipipe_percpu, 0).root = bootup_context;
+
+ for_each_possible_cpu(cpu) {
+ p = &per_cpu(ipipe_percpu, cpu);
+ p->curr = &p->root;
+ }
+}
+
+#else /* !CONFIG_SMP */
+
+static inline void fixup_percpu_data(void) { }
+
+#endif /* CONFIG_SMP */
+
+void __init __ipipe_init_early(void)
+{
+ struct ipipe_domain *ipd = &ipipe_root;
+ int cpu;
+
+ fixup_percpu_data();
+
+ /*
+ * A lightweight registration code for the root domain. We are
+ * running on the boot CPU, hw interrupts are off, and
+ * secondary CPUs are still lost in space.
+ */
+ ipd->name = "Linux";
+ ipd->context_offset = root_context_offset();
+ init_stage(ipd);
+
+ /*
+ * Do the early init stuff. First we do the per-arch pipeline
+ * core setup, then we run the per-client setup code. At this
+ * point, the kernel does not provide much services yet: be
+ * careful.
+ */
+ __ipipe_early_core_setup();
+ __ipipe_early_client_setup();
+
+#ifdef CONFIG_PRINTK
+ __ipipe_printk_virq = ipipe_alloc_virq();
+ ipd->irqs[__ipipe_printk_virq].handler = __ipipe_flush_printk;
+ ipd->irqs[__ipipe_printk_virq].cookie = NULL;
+ ipd->irqs[__ipipe_printk_virq].ackfn = NULL;
+ ipd->irqs[__ipipe_printk_virq].control = IPIPE_HANDLE_MASK;
+#endif /* CONFIG_PRINTK */
+
+ __ipipe_work_virq = ipipe_alloc_virq();
+ ipd->irqs[__ipipe_work_virq].handler = __ipipe_do_work;
+ ipd->irqs[__ipipe_work_virq].cookie = NULL;
+ ipd->irqs[__ipipe_work_virq].ackfn = NULL;
+ ipd->irqs[__ipipe_work_virq].control = IPIPE_HANDLE_MASK;
+
+ for_each_possible_cpu(cpu)
+ per_cpu(work_tail, cpu) = per_cpu(work_buf, cpu);
+}
+
+void __init __ipipe_init(void)
+{
+ /* Now we may engage the pipeline. */
+ __ipipe_enable_pipeline();
+
+ pr_info("Interrupt pipeline (release #%d)\n", IPIPE_CORE_RELEASE);
+}
+
+static inline void init_head_stage(struct ipipe_domain *ipd)
+{
+ struct ipipe_percpu_domain_data *p;
+ int cpu;
+
+ /* Must be set first, used in ipipe_percpu_context(). */
+ ipd->context_offset = offsetof(struct ipipe_percpu_data, head);
+
+ for_each_online_cpu(cpu) {
+ p = ipipe_percpu_context(ipd, cpu);
+ memset(p, 0, sizeof(*p));
+ p->domain = ipd;
+ }
+
+ init_stage(ipd);
+}
+
+void ipipe_register_head(struct ipipe_domain *ipd, const char *name)
+{
+ BUG_ON(!ipipe_root_p || ipd == &ipipe_root);
+
+ ipd->name = name;
+ init_head_stage(ipd);
+ barrier();
+ ipipe_head_domain = ipd;
+ add_domain_proc(ipd);
+
+ pr_info("I-pipe: head domain %s registered.\n", name);
+}
+EXPORT_SYMBOL_GPL(ipipe_register_head);
+
+void ipipe_unregister_head(struct ipipe_domain *ipd)
+{
+ BUG_ON(!ipipe_root_p || ipd != ipipe_head_domain);
+
+ ipipe_head_domain = &ipipe_root;
+ smp_mb();
+ mutex_lock(&ipd->mutex);
+ remove_domain_proc(ipd);
+ mutex_unlock(&ipd->mutex);
+
+ pr_info("I-pipe: head domain %s unregistered.\n", ipd->name);
+}
+EXPORT_SYMBOL_GPL(ipipe_unregister_head);
+
+void ipipe_stall_root(void)
+{
+ unsigned long flags;
+
+ ipipe_root_only();
+ flags = hard_smp_local_irq_save();
+ __set_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+ hard_smp_local_irq_restore(flags);
+}
+EXPORT_SYMBOL(ipipe_stall_root);
+
+unsigned long ipipe_test_and_stall_root(void)
+{
+ unsigned long flags;
+ int x;
+
+ ipipe_root_only();
+ flags = hard_smp_local_irq_save();
+ x = __test_and_set_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+ hard_smp_local_irq_restore(flags);
+
+ return x;
+}
+EXPORT_SYMBOL(ipipe_test_and_stall_root);
+
+unsigned long ipipe_test_root(void)
+{
+ unsigned long flags;
+ int x;
+
+ flags = hard_smp_local_irq_save();
+ x = test_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+ hard_smp_local_irq_restore(flags);
+
+ return x;
+}
+EXPORT_SYMBOL(ipipe_test_root);
+
+void ipipe_unstall_root(void)
+{
+ struct ipipe_percpu_domain_data *p;
+
+ hard_local_irq_disable();
+
+ /* This helps catching bad usage from assembly call sites. */
+ ipipe_root_only();
+
+ p = ipipe_this_cpu_root_context();
+
+ __clear_bit(IPIPE_STALL_FLAG, &p->status);
+
+ if (unlikely(__ipipe_ipending_p(p)))
+ __ipipe_sync_stage();
+
+ hard_local_irq_enable();
+}
+EXPORT_SYMBOL(ipipe_unstall_root);
+
+void ipipe_restore_root(unsigned long x)
+{
+ ipipe_root_only();
+
+ if (x)
+ ipipe_stall_root();
+ else
+ ipipe_unstall_root();
+}
+EXPORT_SYMBOL(ipipe_restore_root);
+
+void __ipipe_restore_root_nosync(unsigned long x)
+{
+ struct ipipe_percpu_domain_data *p = ipipe_this_cpu_root_context();
+
+ if (raw_irqs_disabled_flags(x)) {
+ __set_bit(IPIPE_STALL_FLAG, &p->status);
+ trace_hardirqs_off();
+ } else {
+ trace_hardirqs_on();
+ __clear_bit(IPIPE_STALL_FLAG, &p->status);
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_restore_root_nosync);
+
+void ipipe_unstall_head(void)
+{
+ struct ipipe_percpu_domain_data *p = ipipe_this_cpu_head_context();
+
+ hard_local_irq_disable();
+
+ __clear_bit(IPIPE_STALL_FLAG, &p->status);
+
+ if (unlikely(__ipipe_ipending_p(p)))
+ __ipipe_sync_pipeline(ipipe_head_domain);
+
+ hard_local_irq_enable();
+}
+EXPORT_SYMBOL_GPL(ipipe_unstall_head);
+
+void __ipipe_restore_head(unsigned long x) /* hw interrupt off */
+{
+ struct ipipe_percpu_domain_data *p = ipipe_this_cpu_head_context();
+
+ if (x) {
+#ifdef CONFIG_DEBUG_KERNEL
+ static int warned;
+ if (!warned &&
+ __test_and_set_bit(IPIPE_STALL_FLAG, &p->status)) {
+ /*
+ * Already stalled albeit ipipe_restore_head()
+ * should have detected it? Send a warning once.
+ */
+ hard_local_irq_enable();
+ warned = 1;
+ pr_warning("I-pipe: ipipe_restore_head() "
+ "optimization failed.\n");
+ dump_stack();
+ hard_local_irq_disable();
+ }
+#else /* !CONFIG_DEBUG_KERNEL */
+ __set_bit(IPIPE_STALL_FLAG, &p->status);
+#endif /* CONFIG_DEBUG_KERNEL */
+ } else {
+ __clear_bit(IPIPE_STALL_FLAG, &p->status);
+ if (unlikely(__ipipe_ipending_p(p)))
+ __ipipe_sync_pipeline(ipipe_head_domain);
+ hard_local_irq_enable();
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_restore_head);
+
+void __ipipe_spin_lock_irq(ipipe_spinlock_t *lock)
+{
+ hard_local_irq_disable();
+ if (ipipe_smp_p)
+ arch_spin_lock(&lock->arch_lock);
+ __set_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_lock_irq);
+
+void __ipipe_spin_unlock_irq(ipipe_spinlock_t *lock)
+{
+ if (ipipe_smp_p)
+ arch_spin_unlock(&lock->arch_lock);
+ __clear_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+ hard_local_irq_enable();
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_unlock_irq);
+
+unsigned long __ipipe_spin_lock_irqsave(ipipe_spinlock_t *lock)
+{
+ unsigned long flags;
+ int s;
+
+ flags = hard_local_irq_save();
+ if (ipipe_smp_p)
+ arch_spin_lock(&lock->arch_lock);
+ s = __test_and_set_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+
+ return arch_mangle_irq_bits(s, flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_lock_irqsave);
+
+int __ipipe_spin_trylock_irqsave(ipipe_spinlock_t *lock,
+ unsigned long *x)
+{
+ unsigned long flags;
+ int s;
+
+ flags = hard_local_irq_save();
+ if (ipipe_smp_p && !arch_spin_trylock(&lock->arch_lock)) {
+ hard_local_irq_restore(flags);
+ return 0;
+ }
+ s = __test_and_set_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+ *x = arch_mangle_irq_bits(s, flags);
+
+ return 1;
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_trylock_irqsave);
+
+void __ipipe_spin_unlock_irqrestore(ipipe_spinlock_t *lock,
+ unsigned long x)
+{
+ if (ipipe_smp_p)
+ arch_spin_unlock(&lock->arch_lock);
+ if (!arch_demangle_irq_bits(&x))
+ __clear_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+ hard_local_irq_restore(x);
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_unlock_irqrestore);
+
+int __ipipe_spin_trylock_irq(ipipe_spinlock_t *lock)
+{
+ unsigned long flags;
+
+ flags = hard_local_irq_save();
+ if (ipipe_smp_p && !arch_spin_trylock(&lock->arch_lock)) {
+ hard_local_irq_restore(flags);
+ return 0;
+ }
+ __set_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+
+ return 1;
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_trylock_irq);
+
+void __ipipe_spin_unlock_irqbegin(ipipe_spinlock_t *lock)
+{
+ if (ipipe_smp_p)
+ arch_spin_unlock(&lock->arch_lock);
+}
+
+void __ipipe_spin_unlock_irqcomplete(unsigned long x)
+{
+ if (!arch_demangle_irq_bits(&x))
+ __clear_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+ hard_local_irq_restore(x);
+}
+
+/* Must be called hw IRQs off. */
+static inline void __ipipe_set_irq_held(struct ipipe_percpu_domain_data *p,
+ unsigned int irq)
+{
+ __set_bit(irq, p->irqheld_map);
+ p->irqall[irq]++;
+}
+
+#if __IPIPE_IRQMAP_LEVELS == 4
+
+/* Must be called hw IRQs off. */
+void __ipipe_set_irq_pending(struct ipipe_domain *ipd, unsigned int irq)
+{
+ struct ipipe_percpu_domain_data *p = ipipe_this_cpu_context(ipd);
+ int l0b, l1b, l2b;
+
+ IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+ l0b = irq / (BITS_PER_LONG * BITS_PER_LONG * BITS_PER_LONG);
+ l1b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+ l2b = irq / BITS_PER_LONG;
+
+ if (likely(!test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))) {
+ __set_bit(l0b, &p->irqpend_0map);
+ __set_bit(l1b, p->irqpend_1map);
+ __set_bit(l2b, p->irqpend_2map);
+ __set_bit(irq, p->irqpend_map);
+ } else
+ __set_bit(irq, p->irqheld_map);
+
+ p->irqall[irq]++;
+}
+EXPORT_SYMBOL_GPL(__ipipe_set_irq_pending);
+
+/* Must be called hw IRQs off. */
+void __ipipe_lock_irq(unsigned int irq)
+{
+ struct ipipe_domain *ipd = ipipe_root_domain;
+ struct ipipe_percpu_domain_data *p;
+ int l0b, l1b, l2b;
+
+ IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+ /*
+ * Interrupts requested by a registered head domain cannot be
+ * locked, since this would make no sense: interrupts are
+ * globally masked at CPU level when the head domain is
+ * stalled, so there is no way we could encounter the
+ * situation IRQ locks are handling.
+ */
+ if (test_and_set_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+ return;
+
+ p = ipipe_this_cpu_context(ipd);
+ if (__test_and_clear_bit(irq, p->irqpend_map)) {
+ __set_bit(irq, p->irqheld_map);
+ l2b = irq / BITS_PER_LONG;
+ if (p->irqpend_map[l2b] == 0) {
+ __clear_bit(l2b, p->irqpend_2map);
+ l1b = l2b / BITS_PER_LONG;
+ if (p->irqpend_2map[l1b] == 0) {
+ __clear_bit(l1b, p->irqpend_1map);
+ l0b = l1b / BITS_PER_LONG;
+ if (p->irqpend_1map[l0b] == 0)
+ __clear_bit(l0b, &p->irqpend_0map);
+ }
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_lock_irq);
+
+/* Must be called hw IRQs off. */
+void __ipipe_unlock_irq(unsigned int irq)
+{
+ struct ipipe_domain *ipd = ipipe_root_domain;
+ struct ipipe_percpu_domain_data *p;
+ int l0b, l1b, l2b, cpu;
+
+ IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+ if (!test_and_clear_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+ return;
+
+ l0b = irq / (BITS_PER_LONG * BITS_PER_LONG * BITS_PER_LONG);
+ l1b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+ l2b = irq / BITS_PER_LONG;
+
+ for_each_online_cpu(cpu) {
+ p = ipipe_this_cpu_root_context();
+ if (test_and_clear_bit(irq, p->irqheld_map)) {
+ /* We need atomic ops here: */
+ set_bit(irq, p->irqpend_map);
+ set_bit(l2b, p->irqpend_2map);
+ set_bit(l1b, p->irqpend_1map);
+ set_bit(l0b, &p->irqpend_0map);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_unlock_irq);
+
+#define wmul1(__n) ((__n) * BITS_PER_LONG)
+#define wmul2(__n) (wmul1(__n) * BITS_PER_LONG)
+#define wmul3(__n) (wmul2(__n) * BITS_PER_LONG)
+
+static inline int __ipipe_next_irq(struct ipipe_percpu_domain_data *p)
+{
+ unsigned long l0m, l1m, l2m, l3m;
+ int l0b, l1b, l2b, l3b;
+ unsigned int irq;
+
+ l0m = p->irqpend_0map;
+ if (unlikely(l0m == 0))
+ return -1;
+ l0b = __ipipe_ffnz(l0m);
+ irq = wmul3(l0b);
+
+ l1m = p->irqpend_1map[l0b];
+ if (unlikely(l1m == 0))
+ return -1;
+ l1b = __ipipe_ffnz(l1m);
+ irq += wmul2(l1b);
+
+ l2m = p->irqpend_2map[wmul1(l0b) + l1b];
+ if (unlikely(l2m == 0))
+ return -1;
+ l2b = __ipipe_ffnz(l2m);
+ irq += wmul1(l2b);
+
+ l3m = p->irqpend_map[wmul2(l0b) + wmul1(l1b) + l2b];
+ if (unlikely(l3m == 0))
+ return -1;
+ l3b = __ipipe_ffnz(l3m);
+ irq += l3b;
+
+ __clear_bit(irq, p->irqpend_map);
+ if (p->irqpend_map[irq / BITS_PER_LONG] == 0) {
+ __clear_bit(l2b, &p->irqpend_2map[wmul1(l0b) + l1b]);
+ if (p->irqpend_2map[wmul1(l0b) + l1b] == 0) {
+ __clear_bit(l1b, &p->irqpend_1map[l0b]);
+ if (p->irqpend_1map[l0b] == 0)
+ __clear_bit(l0b, &p->irqpend_0map);
+ }
+ }
+
+ return irq;
+}
+
+#elif __IPIPE_IRQMAP_LEVELS == 3
+
+/* Must be called hw IRQs off. */
+void __ipipe_set_irq_pending(struct ipipe_domain *ipd, unsigned int irq)
+{
+ struct ipipe_percpu_domain_data *p = ipipe_this_cpu_context(ipd);
+ int l0b, l1b;
+
+ IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+ l0b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+ l1b = irq / BITS_PER_LONG;
+
+ if (likely(!test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))) {
+ __set_bit(irq, p->irqpend_map);
+ __set_bit(l1b, p->irqpend_1map);
+ __set_bit(l0b, &p->irqpend_0map);
+ } else
+ __set_bit(irq, p->irqheld_map);
+
+ p->irqall[irq]++;
+}
+EXPORT_SYMBOL_GPL(__ipipe_set_irq_pending);
+
+/* Must be called hw IRQs off. */
+void __ipipe_lock_irq(unsigned int irq)
+{
+ struct ipipe_domain *ipd = ipipe_root_domain;
+ struct ipipe_percpu_domain_data *p;
+ int l0b, l1b;
+
+ IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+ /*
+ * Interrupts requested by a registered head domain cannot be
+ * locked, since this would make no sense: interrupts are
+ * globally masked at CPU level when the head domain is
+ * stalled, so there is no way we could encounter the
+ * situation IRQ locks are handling.
+ */
+ if (test_and_set_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+ return;
+
+ l0b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+ l1b = irq / BITS_PER_LONG;
+
+ p = ipipe_this_cpu_context(ipd);
+ if (__test_and_clear_bit(irq, p->irqpend_map)) {
+ __set_bit(irq, p->irqheld_map);
+ if (p->irqpend_map[l1b] == 0) {
+ __clear_bit(l1b, p->irqpend_1map);
+ if (p->irqpend_1map[l0b] == 0)
+ __clear_bit(l0b, &p->irqpend_0map);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_lock_irq);
+
+/* Must be called hw IRQs off. */
+void __ipipe_unlock_irq(unsigned int irq)
+{
+ struct ipipe_domain *ipd = ipipe_root_domain;
+ struct ipipe_percpu_domain_data *p;
+ int l0b, l1b, cpu;
+
+ IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+ if (!test_and_clear_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+ return;
+
+ l0b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+ l1b = irq / BITS_PER_LONG;
+
+ for_each_online_cpu(cpu) {
+ p = ipipe_this_cpu_root_context();
+ if (test_and_clear_bit(irq, p->irqheld_map)) {
+ /* We need atomic ops here: */
+ set_bit(irq, p->irqpend_map);
+ set_bit(l1b, p->irqpend_1map);
+ set_bit(l0b, &p->irqpend_0map);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_unlock_irq);
+
+static inline int __ipipe_next_irq(struct ipipe_percpu_domain_data *p)
+{
+ int l0b, l1b, l2b;
+ unsigned long l0m, l1m, l2m;
+ unsigned int irq;
+
+ l0m = p->irqpend_0map;
+ if (unlikely(l0m == 0))
+ return -1;
+
+ l0b = __ipipe_ffnz(l0m);
+ l1m = p->irqpend_1map[l0b];
+ if (unlikely(l1m == 0))
+ return -1;
+
+ l1b = __ipipe_ffnz(l1m) + l0b * BITS_PER_LONG;
+ l2m = p->irqpend_map[l1b];
+ if (unlikely(l2m == 0))
+ return -1;
+
+ l2b = __ipipe_ffnz(l2m);
+ irq = l1b * BITS_PER_LONG + l2b;
+
+ __clear_bit(irq, p->irqpend_map);
+ if (p->irqpend_map[l1b] == 0) {
+ __clear_bit(l1b, p->irqpend_1map);
+ if (p->irqpend_1map[l0b] == 0)
+ __clear_bit(l0b, &p->irqpend_0map);
+ }
+
+ return irq;
+}
+
+#else /* __IPIPE_IRQMAP_LEVELS == 2 */
+
+/* Must be called hw IRQs off. */
+void __ipipe_set_irq_pending(struct ipipe_domain *ipd, unsigned int irq)
+{
+ struct ipipe_percpu_domain_data *p = ipipe_this_cpu_context(ipd);
+ int l0b = irq / BITS_PER_LONG;
+
+ IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+ if (likely(!test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))) {
+ __set_bit(irq, p->irqpend_map);
+ __set_bit(l0b, &p->irqpend_0map);
+ } else
+ __set_bit(irq, p->irqheld_map);
+
+ p->irqall[irq]++;
+}
+EXPORT_SYMBOL_GPL(__ipipe_set_irq_pending);
+
+/* Must be called hw IRQs off. */
+void __ipipe_lock_irq(unsigned int irq)
+{
+ struct ipipe_percpu_domain_data *p;
+ int l0b = irq / BITS_PER_LONG;
+
+ IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+ if (test_and_set_bit(IPIPE_LOCK_FLAG,
+ &ipipe_root_domain->irqs[irq].control))
+ return;
+
+ p = ipipe_this_cpu_root_context();
+ if (__test_and_clear_bit(irq, p->irqpend_map)) {
+ __set_bit(irq, p->irqheld_map);
+ if (p->irqpend_map[l0b] == 0)
+ __clear_bit(l0b, &p->irqpend_0map);
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_lock_irq);
+
+/* Must be called hw IRQs off. */
+void __ipipe_unlock_irq(unsigned int irq)
+{
+ struct ipipe_domain *ipd = ipipe_root_domain;
+ struct ipipe_percpu_domain_data *p;
+ int l0b = irq / BITS_PER_LONG, cpu;
+
+ IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+ if (!test_and_clear_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+ return;
+
+ for_each_online_cpu(cpu) {
+ p = ipipe_percpu_context(ipd, cpu);
+ if (test_and_clear_bit(irq, p->irqheld_map)) {
+ /* We need atomic ops here: */
+ set_bit(irq, p->irqpend_map);
+ set_bit(l0b, &p->irqpend_0map);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_unlock_irq);
+
+static inline int __ipipe_next_irq(struct ipipe_percpu_domain_data *p)
+{
+ unsigned long l0m, l1m;
+ int l0b, l1b;
+
+ l0m = p->irqpend_0map;
+ if (unlikely(l0m == 0))
+ return -1;
+
+ l0b = __ipipe_ffnz(l0m);
+ l1m = p->irqpend_map[l0b];
+ if (unlikely(l1m == 0))
+ return -1;
+
+ l1b = __ipipe_ffnz(l1m);
+ __clear_bit(l1b, &p->irqpend_map[l0b]);
+ if (p->irqpend_map[l0b] == 0)
+ __clear_bit(l0b, &p->irqpend_0map);
+
+ return l0b * BITS_PER_LONG + l1b;
+}
+
+#endif
+
+void __ipipe_do_sync_pipeline(struct ipipe_domain *top)
+{
+ struct ipipe_percpu_domain_data *p;
+ struct ipipe_domain *ipd;
+
+ /* We must enter over the root domain. */
+ IPIPE_WARN_ONCE(__ipipe_current_domain != ipipe_root_domain);
+ ipd = top;
+next:
+ p = ipipe_this_cpu_context(ipd);
+ if (test_bit(IPIPE_STALL_FLAG, &p->status))
+ return;
+
+ if (__ipipe_ipending_p(p)) {
+ if (ipd == ipipe_root_domain)
+ __ipipe_sync_stage();
+ else {
+ /* Switching to head. */
+ p->coflags &= ~__IPIPE_ALL_R;
+ __ipipe_set_current_context(p);
+ __ipipe_sync_stage();
+ __ipipe_set_current_domain(ipipe_root_domain);
+ }
+ }
+
+ if (ipd != ipipe_root_domain) {
+ ipd = ipipe_root_domain;
+ goto next;
+ }
+}
+EXPORT_SYMBOL_GPL(__ipipe_do_sync_pipeline);
+
+unsigned int ipipe_alloc_virq(void)
+{
+ unsigned long flags, irq = 0;
+ int ipos;
+
+ raw_spin_lock_irqsave(&__ipipe_lock, flags);
+
+ if (__ipipe_virtual_irq_map != ~0) {
+ ipos = ffz(__ipipe_virtual_irq_map);
+ set_bit(ipos, &__ipipe_virtual_irq_map);
+ irq = ipos + IPIPE_VIRQ_BASE;
+ }
+
+ raw_spin_unlock_irqrestore(&__ipipe_lock, flags);
+
+ return irq;
+}
+EXPORT_SYMBOL_GPL(ipipe_alloc_virq);
+
+void ipipe_free_virq(unsigned int virq)
+{
+ clear_bit(virq - IPIPE_VIRQ_BASE, &__ipipe_virtual_irq_map);
+ smp_mb__after_atomic();
+}
+EXPORT_SYMBOL_GPL(ipipe_free_virq);
+
+int ipipe_request_irq(struct ipipe_domain *ipd,
+ unsigned int irq,
+ ipipe_irq_handler_t handler,
+ void *cookie,
+ ipipe_irq_ackfn_t ackfn)
+{
+ unsigned long flags;
+ int ret = 0;
+
+ ipipe_root_only();
+
+ if (handler == NULL ||
+ (irq >= IPIPE_NR_XIRQS && !ipipe_virtual_irq_p(irq)))
+ return -EINVAL;
+
+ raw_spin_lock_irqsave(&__ipipe_lock, flags);
+
+ if (ipd->irqs[irq].handler) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ if (ackfn == NULL)
+ ackfn = ipipe_root_domain->irqs[irq].ackfn;
+
+ ipd->irqs[irq].handler = handler;
+ ipd->irqs[irq].cookie = cookie;
+ ipd->irqs[irq].ackfn = ackfn;
+ ipd->irqs[irq].control = IPIPE_HANDLE_MASK;
+out:
+ raw_spin_unlock_irqrestore(&__ipipe_lock, flags);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(ipipe_request_irq);
+
+void ipipe_free_irq(struct ipipe_domain *ipd,
+ unsigned int irq)
+{
+ unsigned long flags;
+
+ ipipe_root_only();
+
+ raw_spin_lock_irqsave(&__ipipe_lock, flags);
+
+ if (ipd->irqs[irq].handler == NULL)
+ goto out;
+
+ ipd->irqs[irq].handler = NULL;
+ ipd->irqs[irq].cookie = NULL;
+ ipd->irqs[irq].ackfn = NULL;
+ ipd->irqs[irq].control = 0;
+out:
+ raw_spin_unlock_irqrestore(&__ipipe_lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipipe_free_irq);
+
+void ipipe_set_hooks(struct ipipe_domain *ipd, int enables)
+{
+ struct ipipe_percpu_domain_data *p;
+ unsigned long flags;
+ int cpu, wait;
+
+ if (ipd == ipipe_root_domain) {
+ IPIPE_WARN(enables & __IPIPE_TRAP_E);
+ enables &= ~__IPIPE_TRAP_E;
+ } else {
+ IPIPE_WARN(enables & __IPIPE_KEVENT_E);
+ enables &= ~__IPIPE_KEVENT_E;
+ }
+
+ flags = ipipe_critical_enter(NULL);
+
+ for_each_online_cpu(cpu) {
+ p = ipipe_percpu_context(ipd, cpu);
+ p->coflags &= ~__IPIPE_ALL_E;
+ p->coflags |= enables;
+ }
+
+ wait = (enables ^ __IPIPE_ALL_E) << __IPIPE_SHIFT_R;
+ if (wait == 0 || !__ipipe_root_p) {
+ ipipe_critical_exit(flags);
+ return;
+ }
+
+ ipipe_this_cpu_context(ipd)->coflags &= ~wait;
+
+ ipipe_critical_exit(flags);
+
+ /*
+ * In case we cleared some hooks over the root domain, we have
+ * to wait for any ongoing execution to finish, since our
+ * caller might subsequently unmap the target domain code.
+ *
+ * We synchronize with the relevant __ipipe_notify_*()
+ * helpers, disabling all hooks before we start waiting for
+ * completion on all CPUs.
+ */
+ for_each_online_cpu(cpu) {
+ while (ipipe_percpu_context(ipd, cpu)->coflags & wait)
+ schedule_timeout_interruptible(HZ / 50);
+ }
+}
+EXPORT_SYMBOL_GPL(ipipe_set_hooks);
+
+int __weak ipipe_fastcall_hook(struct pt_regs *regs)
+{
+ return rtai_fastcall_hook ? rtai_fastcall_hook(regs) : -1;
+}
+
+int __ipipe_notify_syscall(struct pt_regs *regs)
+{
+ return rtai_syscall_hook ? rtai_syscall_hook(regs) : 0;
+}
+
+static inline void sync_root_irqs(void)
+{
+ struct ipipe_percpu_domain_data *p;
+ unsigned long flags;
+
+ flags = hard_local_irq_save();
+
+ p = ipipe_this_cpu_root_context();
+ if (unlikely(__ipipe_ipending_p(p)))
+ __ipipe_sync_stage();
+
+ hard_local_irq_restore(flags);
+}
+
+int ipipe_handle_syscall(struct thread_info *ti,
+ unsigned long nr, struct pt_regs *regs)
+{
+ unsigned long local_flags = READ_ONCE(ti->ipipe_flags);
+ unsigned int nr_syscalls = ipipe_root_nr_syscalls(ti);
+ int ret;
+
+ /*
+ * NOTE: This is a backport from the DOVETAIL syscall
+ * redirector to the older pipeline implementation.
+ *
+ * ==
+ *
+ * If the syscall # is out of bounds and the current IRQ stage
+ * is not the root one, this has to be a non-native system
+ * call handled by some co-kernel on the head stage. Hand it
+ * over to the head stage via the fast syscall handler.
+ *
+ * Otherwise, if the system call is out of bounds or the
+ * current thread is shared with a co-kernel, hand the syscall
+ * over to the latter through the pipeline stages. This
+ * allows:
+ *
+ * - the co-kernel to receive the initial - foreign - syscall
+ * a thread should send for enabling syscall handling by the
+ * co-kernel.
+ *
+ * - the co-kernel to manipulate the current execution stage
+ * for handling the request, which includes switching the
+ * current thread back to the root stage if the syscall is a
+ * native one, or promoting it to the head stage if handling
+ * the foreign syscall requires this.
+ *
+ * Native syscalls from regular (non-pipeline) threads are
+ * ignored by this routine, and flow down to the regular
+ * system call handler.
+ */
+
+ if (nr >= nr_syscalls && (local_flags & _TIP_HEAD)) {
+ ipipe_fastcall_hook(regs);
+ local_flags = READ_ONCE(ti->ipipe_flags);
+ if (local_flags & _TIP_HEAD) {
+ if (local_flags & _TIP_MAYDAY)
+ __ipipe_call_mayday(regs);
+ return 1; /* don't pass down, no tail work. */
+ } else {
+ sync_root_irqs();
+ return -1; /* don't pass down, do tail work. */
+ }
+ }
+
+ if ((local_flags & _TIP_NOTIFY) || nr >= nr_syscalls) {
+ ret =__ipipe_notify_syscall(regs);
+ local_flags = READ_ONCE(ti->ipipe_flags);
+ if (local_flags & _TIP_HEAD)
+ return 1; /* don't pass down, no tail work. */
+ if (ret)
+ return -1; /* don't pass down, do tail work. */
+ }
+
+ return 0; /* pass syscall down to the host. */
+}
+
+int __Ipipe_notify_syscall(struct pt_regs *regs)
+{
+ struct ipipe_domain *caller_domain, *this_domain, *ipd;
+ struct ipipe_percpu_domain_data *p;
+ unsigned long flags;
+ int ret = 0;
+
+ /*
+ * We should definitely not pipeline a syscall with IRQs off.
+ */
+ IPIPE_WARN_ONCE(hard_irqs_disabled());
+
+ flags = hard_local_irq_save();
+ caller_domain = this_domain = __ipipe_current_domain;
+ ipd = ipipe_head_domain;
+next:
+ p = ipipe_this_cpu_context(ipd);
+ if (likely(p->coflags & __IPIPE_SYSCALL_E)) {
+ __ipipe_set_current_context(p);
+ p->coflags |= __IPIPE_SYSCALL_R;
+ hard_local_irq_restore(flags);
+// ret = ipipe_syscall_hook(caller_domain, regs);
+ flags = hard_local_irq_save();
+ p->coflags &= ~__IPIPE_SYSCALL_R;
+ if (__ipipe_current_domain != ipd)
+ /* Account for domain migration. */
+ this_domain = __ipipe_current_domain;
+ else
+ __ipipe_set_current_domain(this_domain);
+ }
+
+ if (this_domain == ipipe_root_domain) {
+ if (ipd != ipipe_root_domain && ret == 0) {
+ ipd = ipipe_root_domain;
+ goto next;
+ }
+ /*
+ * Careful: we may have migrated from head->root, so p
+ * would be ipipe_this_cpu_context(head).
+ */
+ p = ipipe_this_cpu_root_context();
+ if (__ipipe_ipending_p(p))
+ __ipipe_sync_stage();
+ } else if (ipipe_test_thread_flag(TIP_MAYDAY))
+ __ipipe_call_mayday(regs);
+
+ hard_local_irq_restore(flags);
+
+ return ret;
+}
+
+int __ipipe_notify_trap(int exception, struct pt_regs *regs)
+{
+ return (!__ipipe_root_p && rtai_trap_hook) ? rtai_trap_hook(exception, regs) : 0;
+}
+
+int __Ipipe_notify_trap(int exception, struct pt_regs *regs)
+{
+ struct ipipe_percpu_domain_data *p;
+ struct ipipe_trap_data data;
+ unsigned long flags;
+ int ret = 0;
+
+ flags = hard_local_irq_save();
+
+ /*
+ * We send a notification about all traps raised over a
+ * registered head domain only.
+ */
+ if (__ipipe_root_p)
+ goto out;
+
+ p = ipipe_this_cpu_head_context();
+ if (likely(p->coflags & __IPIPE_TRAP_E)) {
+ p->coflags |= __IPIPE_TRAP_R;
+ hard_local_irq_restore(flags);
+ data.exception = exception;
+ data.regs = regs;
+// ret = ipipe_trap_hook(&data);
+ flags = hard_local_irq_save();
+ p->coflags &= ~__IPIPE_TRAP_R;
+ }
+out:
+ hard_local_irq_restore(flags);
+
+ return ret;
+}
+
+int __ipipe_notify_user_intreturn(void)
+{
+ __ipipe_notify_kevent(IPIPE_KEVT_USERINTRET, current);
+
+ return !ipipe_root_p;
+}
+
+int __ipipe_notify_kevent(int kevent, void *data)
+{
+ ipipe_root_only(); return rtai_kevent_hook ? rtai_kevent_hook(kevent, data) : 0;
+}
+
+int __Ipipe_notify_kevent(int kevent, void *data)
+{
+ struct ipipe_percpu_domain_data *p;
+ unsigned long flags;
+ int ret = 0;
+
+ ipipe_root_only();
+
+ flags = hard_local_irq_save();
+
+ p = ipipe_this_cpu_root_context();
+ if (likely(p->coflags & __IPIPE_KEVENT_E)) {
+ p->coflags |= __IPIPE_KEVENT_R;
+ hard_local_irq_restore(flags);
+// ret = ipipe_kevent_hook(kevent, data);
+ flags = hard_local_irq_save();
+ p->coflags &= ~__IPIPE_KEVENT_R;
+ }
+
+ hard_local_irq_restore(flags);
+
+ return ret;
+}
+
+void inline ipipe_migration_hook(struct task_struct *p)
+{ if (rtai_migration_hook) rtai_migration_hook(p);
+}
+
+static void complete_domain_migration(void) /* hw IRQs off */
+{
+ struct ipipe_percpu_domain_data *p;
+ struct ipipe_percpu_data *pd;
+ struct task_struct *t;
+
+ ipipe_root_only();
+ pd = raw_cpu_ptr(&ipipe_percpu);
+ t = pd->task_hijacked;
+ if (t == NULL)
+ return;
+
+ pd->task_hijacked = NULL;
+ t->state &= ~TASK_HARDENING;
+ if (t->state != TASK_INTERRUPTIBLE)
+ /* Migration aborted (by signal). */
+ return;
+
+ ipipe_set_ti_thread_flag(task_thread_info(t), TIP_HEAD);
+ p = ipipe_this_cpu_head_context();
+ IPIPE_WARN_ONCE(test_bit(IPIPE_STALL_FLAG, &p->status));
+ /*
+ * hw IRQs are disabled, but the completion hook assumes the
+ * head domain is logically stalled: fix it up.
+ */
+ __set_bit(IPIPE_STALL_FLAG, &p->status);
+ ipipe_migration_hook(t);
+ __clear_bit(IPIPE_STALL_FLAG, &p->status);
+ if (__ipipe_ipending_p(p))
+ __ipipe_sync_pipeline(p->domain);
+}
+
+void __ipipe_complete_domain_migration(void)
+{
+ unsigned long flags;
+
+ flags = hard_local_irq_save();
+ complete_domain_migration();
+ hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_complete_domain_migration);
+
+int __ipipe_switch_tail(void)
+{
+ int x;
+
+#ifdef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
+ hard_local_irq_disable();
+#endif
+ x = __ipipe_root_p;
+ if (x)
+ complete_domain_migration();
+
+#ifndef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
+ if (x)
+#endif
+ hard_local_irq_enable();
+
+ return !x;
+}
+
+void __ipipe_notify_vm_preemption(void)
+{
+ struct ipipe_vm_notifier *vmf;
+ struct ipipe_percpu_data *p;
+
+ ipipe_check_irqoff();
+ p = __ipipe_raw_cpu_ptr(&ipipe_percpu);
+ vmf = p->vm_notifier;
+ if (unlikely(vmf))
+ vmf->handler(vmf);
+}
+EXPORT_SYMBOL_GPL(__ipipe_notify_vm_preemption);
+
+static void Dispatch_irq_head(unsigned int irq) /* hw interrupts off */
+{
+ struct ipipe_percpu_domain_data *p = ipipe_this_cpu_head_context(), *old;
+ struct ipipe_domain *head = p->domain;
+
+ if (unlikely(test_bit(IPIPE_STALL_FLAG, &p->status))) {
+ __ipipe_set_irq_pending(head, irq);
+ return;
+ }
+
+ /* Switch to the head domain if not current. */
+ old = __ipipe_current_context;
+ if (old != p)
+ __ipipe_set_current_context(p);
+
+ p->irqall[irq]++;
+ __set_bit(IPIPE_STALL_FLAG, &p->status);
+ barrier();
+ head->irqs[irq].handler(irq, head->irqs[irq].cookie);
+ __ipipe_run_irqtail(irq);
+ hard_local_irq_disable();
+ p = ipipe_this_cpu_head_context();
+ __clear_bit(IPIPE_STALL_FLAG, &p->status);
+
+ /* Are we still running in the head domain? */
+ if (likely(__ipipe_current_context == p)) {
+ /* Did we enter this code over the head domain? */
+ if (old->domain == head) {
+ /* Yes, do immediate synchronization. */
+ if (__ipipe_ipending_p(p))
+ __ipipe_sync_stage();
+ return;
+ }
+ __ipipe_set_current_context(ipipe_this_cpu_root_context());
+ }
+
+ /*
+ * We must be running over the root domain, synchronize
+ * the pipeline for high priority IRQs (slow path).
+ */
+ __ipipe_do_sync_pipeline(head);
+}
+
+void __ipipe_dispatch_irq(unsigned int irq, int flags) /* hw interrupts off */
+{
+ struct ipipe_domain *ipd;
+ struct irq_desc *desc;
+ unsigned long control;
+ int chained_irq;
+
+ /*
+ * Survival kit when reading this code:
+ *
+ * - we have two main situations, leading to three cases for
+ * handling interrupts:
+ *
+ * a) the root domain is alone, no registered head domain
+ * => all interrupts go through the interrupt log
+ * b) a head domain is registered
+ * => head domain IRQs go through the fast dispatcher
+ * => root domain IRQs go through the interrupt log
+ *
+ * - when no head domain is registered, ipipe_head_domain ==
+ * ipipe_root_domain == &ipipe_root.
+ *
+ * - the caller tells us whether we should acknowledge this
+ * IRQ. Even virtual IRQs may require acknowledge on some
+ * platforms (e.g. arm/SMP).
+ *
+ * - the caller tells us whether we may try to run the IRQ log
+ * syncer. Typically, demuxed IRQs won't be synced
+ * immediately.
+ *
+ * - multiplex IRQs most likely have a valid acknowledge
+ * handler and we may not be called with IPIPE_IRQF_NOACK
+ * for them. The ack handler for the multiplex IRQ actually
+ * decodes the demuxed interrupts.
+ */
+
+#ifdef CONFIG_IPIPE_DEBUG
+ if (irq >= IPIPE_NR_IRQS) {
+ pr_err("I-pipe: spurious interrupt %u\n", irq);
+ return;
+ }
+#endif
+ /*
+ * CAUTION: on some archs, virtual IRQs may have acknowledge
+ * handlers. Multiplex IRQs should have one too.
+ */
+ if (unlikely(irq >= IPIPE_NR_XIRQS)) {
+ desc = NULL;
+ chained_irq = 0;
+ } else {
+ desc = irq_to_desc(irq);
+ chained_irq = desc ? ipipe_chained_irq_p(desc) : 0;
+ }
+ if (flags & IPIPE_IRQF_NOACK)
+ IPIPE_WARN_ONCE(chained_irq);
+ else {
+ ipd = ipipe_head_domain;
+ control = ipd->irqs[irq].control;
+ if ((control & IPIPE_HANDLE_MASK) == 0)
+ ipd = ipipe_root_domain;
+ if (ipd->irqs[irq].ackfn)
+ ipd->irqs[irq].ackfn(desc);
+ if (chained_irq) {
+ if ((flags & IPIPE_IRQF_NOSYNC) == 0)
+ /* Run demuxed IRQ handlers. */
+ goto sync;
+ return;
+ }
+ }
+
+ /*
+ * Sticky interrupts must be handled early and separately, so
+ * that we always process them on the current domain.
+ */
+ ipd = __ipipe_current_domain;
+ control = ipd->irqs[irq].control;
+ if (control & IPIPE_STICKY_MASK)
+ goto log;
+
+ /*
+ * In case we have no registered head domain
+ * (i.e. ipipe_head_domain == &ipipe_root), we always go
+ * through the interrupt log, and leave the dispatching work
+ * ultimately to __ipipe_sync_pipeline().
+ */
+ ipd = ipipe_head_domain;
+ control = ipd->irqs[irq].control;
+ if (ipd == ipipe_root_domain)
+ /*
+ * The root domain must handle all interrupts, so
+ * testing the HANDLE bit would be pointless.
+ */
+ goto log;
+
+ if (control & IPIPE_HANDLE_MASK) {
+ if (unlikely(flags & IPIPE_IRQF_NOSYNC))
+ __ipipe_set_irq_pending(ipd, irq);
+ else
+ dispatch_irq_head(irq);
+ return;
+ }
+
+ ipd = ipipe_root_domain;
+log:
+ __ipipe_set_irq_pending(ipd, irq);
+
+ if (flags & IPIPE_IRQF_NOSYNC)
+ return;
+
+ /*
+ * Optimize if we preempted a registered high priority head
+ * domain: we don't need to synchronize the pipeline unless
+ * there is a pending interrupt for it.
+ */
+ if (!__ipipe_root_p &&
+ !__ipipe_ipending_p(ipipe_this_cpu_head_context()))
+ return;
+sync:
+ __ipipe_sync_pipeline(ipipe_head_domain);
+}
+EXPORT_SYMBOL_GPL(__ipipe_dispatch_irq);
+
+void ipipe_raise_irq(unsigned int irq)
+{
+ struct ipipe_domain *ipd = ipipe_head_domain;
+ unsigned long flags, control;
+
+ flags = hard_local_irq_save();
+
+ /*
+ * Fast path: raising a virtual IRQ handled by the head
+ * domain.
+ */
+ if (likely(ipipe_virtual_irq_p(irq) && ipd != ipipe_root_domain)) {
+ control = ipd->irqs[irq].control;
+ if (likely(control & IPIPE_HANDLE_MASK)) {
+ dispatch_irq_head(irq);
+ goto out;
+ }
+ }
+
+ /* Emulate regular device IRQ receipt. */
+ __ipipe_dispatch_irq(irq, IPIPE_IRQF_NOACK);
+out:
+ hard_local_irq_restore(flags);
+
+}
+EXPORT_SYMBOL_GPL(ipipe_raise_irq);
+
+#ifdef CONFIG_PREEMPT
+
+void preempt_schedule_irq(void);
+
+void __sched __ipipe_preempt_schedule_irq(void)
+{
+ struct ipipe_percpu_domain_data *p;
+ unsigned long flags;
+
+ if (WARN_ON_ONCE(!hard_irqs_disabled()))
+ hard_local_irq_disable();
+
+ local_irq_save(flags);
+ hard_local_irq_enable();
+ preempt_schedule_irq(); /* Ok, may reschedule now. */
+ hard_local_irq_disable();
+
+ /*
+ * Flush any pending interrupt that may have been logged after
+ * preempt_schedule_irq() stalled the root stage before
+ * returning to us, and now.
+ */
+ p = ipipe_this_cpu_root_context();
+ if (unlikely(__ipipe_ipending_p(p))) {
+ trace_hardirqs_on();
+ __clear_bit(IPIPE_STALL_FLAG, &p->status);
+ __ipipe_sync_stage();
+ }
+
+ __ipipe_restore_root_nosync(flags);
+}
+
+#else /* !CONFIG_PREEMPT */
+
+#define __ipipe_preempt_schedule_irq() do { } while (0)
+
+#endif /* !CONFIG_PREEMPT */
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+#define root_stall_after_handler() local_irq_disable()
+#else
+#define root_stall_after_handler() do { } while (0)
+#endif
+
+/*
+ * __ipipe_do_sync_stage() -- Flush the pending IRQs for the current
+ * domain (and processor). This routine flushes the interrupt log (see
+ * "Optimistic interrupt protection" from D. Stodolsky et al. for more
+ * on the deferred interrupt scheme). Every interrupt that occurred
+ * while the pipeline was stalled gets played.
+ *
+ * WARNING: CPU migration may occur over this routine.
+ */
+void __ipipe_do_sync_stage(void)
+{
+ struct ipipe_percpu_domain_data *p;
+ struct ipipe_domain *ipd;
+ int irq;
+
+ p = __ipipe_current_context;
+respin:
+ ipd = p->domain;
+
+ __set_bit(IPIPE_STALL_FLAG, &p->status);
+ smp_wmb();
+
+ if (ipd == ipipe_root_domain)
+ trace_hardirqs_off();
+
+ for (;;) {
+ irq = __ipipe_next_irq(p);
+ if (irq < 0)
+ break;
+ /*
+ * Make sure the compiler does not reorder wrongly, so
+ * that all updates to maps are done before the
+ * handler gets called.
+ */
+ barrier();
+
+ if (test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+ continue;
+
+ if (ipd != ipipe_head_domain)
+ hard_local_irq_enable();
+
+ if (likely(ipd != ipipe_root_domain)) {
+ ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie);
+ __ipipe_run_irqtail(irq);
+ hard_local_irq_disable();
+ } else if (ipipe_virtual_irq_p(irq)) {
+ irq_enter();
+ ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie);
+ irq_exit();
+ root_stall_after_handler();
+ hard_local_irq_disable();
+ } else {
+ ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie);
+ root_stall_after_handler();
+ hard_local_irq_disable();
+ }
+
+ /*
+ * We may have migrated to a different CPU (1) upon
+ * return from the handler, or downgraded from the
+ * head domain to the root one (2), the opposite way
+ * is NOT allowed though.
+ *
+ * (1) reload the current per-cpu context pointer, so
+ * that we further pull pending interrupts from the
+ * proper per-cpu log.
+ *
+ * (2) check the stall bit to know whether we may
+ * dispatch any interrupt pending for the root domain,
+ * and respin the entire dispatch loop if
+ * so. Otherwise, immediately return to the caller,
+ * _without_ affecting the stall state for the root
+ * domain, since we do not own it at this stage. This
+ * case is basically reflecting what may happen in
+ * dispatch_irq_head() for the fast path.
+ */
+ p = __ipipe_current_context;
+ if (p->domain != ipd) {
+ IPIPE_BUG_ON(ipd == ipipe_root_domain);
+ if (test_bit(IPIPE_STALL_FLAG, &p->status))
+ return;
+ goto respin;
+ }
+ }
+
+ if (ipd == ipipe_root_domain)
+ trace_hardirqs_on();
+
+ __clear_bit(IPIPE_STALL_FLAG, &p->status);
+}
+EXPORT_SYMBOL_GPL(__ipipe_do_sync_stage);
+
+void __ipipe_call_mayday(struct pt_regs *regs)
+{
+ unsigned long flags;
+
+ ipipe_clear_thread_flag(TIP_MAYDAY);
+ flags = hard_local_irq_save();
+ __ipipe_notify_trap(IPIPE_TRAP_MAYDAY, regs);
+ hard_local_irq_restore(flags);
+}
+
+#ifdef CONFIG_SMP
+
+/* Always called with hw interrupts off. */
+void __ipipe_do_critical_sync(unsigned int irq, void *cookie)
+{
+ int cpu = ipipe_processor_id();
+
+ cpumask_set_cpu(cpu, &__ipipe_cpu_sync_map);
+
+ /*
+ * Now we are in sync with the lock requestor running on
+ * another CPU. Enter a spinning wait until he releases the
+ * global lock.
+ */
+ raw_spin_lock(&__ipipe_cpu_barrier);
+
+ /* Got it. Now get out. */
+
+ /* Call the sync routine if any. */
+ if (__ipipe_cpu_sync)
+ __ipipe_cpu_sync();
+
+ cpumask_set_cpu(cpu, &__ipipe_cpu_pass_map);
+
+ raw_spin_unlock(&__ipipe_cpu_barrier);
+
+ cpumask_clear_cpu(cpu, &__ipipe_cpu_sync_map);
+}
+#endif /* CONFIG_SMP */
+
+unsigned long ipipe_critical_enter(void (*syncfn)(void))
+{
+ static cpumask_t allbutself __maybe_unused, online __maybe_unused;
+ int cpu __maybe_unused, n __maybe_unused;
+ unsigned long flags, loops __maybe_unused;
+
+ flags = hard_local_irq_save();
+
+ if (num_online_cpus() == 1)
+ return flags;
+
+#ifdef CONFIG_SMP
+
+ cpu = ipipe_processor_id();
+ if (!cpumask_test_and_set_cpu(cpu, &__ipipe_cpu_lock_map)) {
+ while (test_and_set_bit(0, &__ipipe_critical_lock)) {
+ n = 0;
+ hard_local_irq_enable();
+
+ do
+ cpu_relax();
+ while (++n < cpu);
+
+ hard_local_irq_disable();
+ }
+restart:
+ online = *cpu_online_mask;
+ raw_spin_lock(&__ipipe_cpu_barrier);
+
+ __ipipe_cpu_sync = syncfn;
+
+ cpumask_clear(&__ipipe_cpu_pass_map);
+ cpumask_set_cpu(cpu, &__ipipe_cpu_pass_map);
+
+ /*
+ * Send the sync IPI to all processors but the current
+ * one.
+ */
+ cpumask_andnot(&allbutself, &online, &__ipipe_cpu_pass_map);
+ ipipe_send_ipi(IPIPE_CRITICAL_IPI, allbutself);
+ loops = IPIPE_CRITICAL_TIMEOUT;
+
+ while (!cpumask_equal(&__ipipe_cpu_sync_map, &allbutself)) {
+ if (--loops > 0) {
+ cpu_relax();
+ continue;
+ }
+ /*
+ * We ran into a deadlock due to a contended
+ * rwlock. Cancel this round and retry.
+ */
+ __ipipe_cpu_sync = NULL;
+
+ raw_spin_unlock(&__ipipe_cpu_barrier);
+ /*
+ * Ensure all CPUs consumed the IPI to avoid
+ * running __ipipe_cpu_sync prematurely. This
+ * usually resolves the deadlock reason too.
+ */
+ while (!cpumask_equal(&online, &__ipipe_cpu_pass_map))
+ cpu_relax();
+
+ goto restart;
+ }
+ }
+
+ atomic_inc(&__ipipe_critical_count);
+
+#endif /* CONFIG_SMP */
+
+ return flags;
+}
+EXPORT_SYMBOL_GPL(ipipe_critical_enter);
+
+void ipipe_critical_exit(unsigned long flags)
+{
+ if (num_online_cpus() == 1) {
+ hard_local_irq_restore(flags);
+ return;
+ }
+
+#ifdef CONFIG_SMP
+ if (atomic_dec_and_test(&__ipipe_critical_count)) {
+ raw_spin_unlock(&__ipipe_cpu_barrier);
+ while (!cpumask_empty(&__ipipe_cpu_sync_map))
+ cpu_relax();
+ cpumask_clear_cpu(ipipe_processor_id(), &__ipipe_cpu_lock_map);
+ clear_bit(0, &__ipipe_critical_lock);
+ smp_mb__after_atomic();
+ }
+#endif /* CONFIG_SMP */
+
+ hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ipipe_critical_exit);
+
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+
+void ipipe_root_only(void)
+{
+ struct ipipe_domain *this_domain;
+ unsigned long flags;
+
+ flags = hard_smp_local_irq_save();
+
+ this_domain = __ipipe_current_domain;
+ if (likely(this_domain == ipipe_root_domain &&
+ !test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status))) {
+ hard_smp_local_irq_restore(flags);
+ return;
+ }
+
+ if (!__this_cpu_read(ipipe_percpu.context_check)) {
+ hard_smp_local_irq_restore(flags);
+ return;
+ }
+
+ hard_smp_local_irq_restore(flags);
+
+ ipipe_prepare_panic();
+ ipipe_trace_panic_freeze();
+
+ if (this_domain != ipipe_root_domain)
+ pr_err("I-pipe: Detected illicit call from head domain '%s'\n"
+ " into a regular Linux service\n",
+ this_domain->name);
+ else
+ pr_err("I-pipe: Detected stalled head domain, "
+ "probably caused by a bug.\n"
+ " A critical section may have been "
+ "left unterminated.\n");
+ dump_stack();
+ ipipe_trace_panic_dump();
+}
+EXPORT_SYMBOL(ipipe_root_only);
+
+#endif /* CONFIG_IPIPE_DEBUG_CONTEXT */
+
+#if defined(CONFIG_IPIPE_DEBUG_INTERNAL) && defined(CONFIG_SMP)
+
+unsigned long notrace __ipipe_cpu_get_offset(void)
+{
+ struct ipipe_domain *this_domain;
+ unsigned long flags;
+ bool bad = false;
+
+ flags = hard_local_irq_save_notrace();
+ if (raw_irqs_disabled_flags(flags))
+ goto out;
+
+ /*
+ * Only the root domain may implement preemptive CPU migration
+ * of tasks, so anything above in the pipeline should be fine.
+ * CAUTION: we want open coded access to the current domain,
+ * don't use __ipipe_current_domain here, this would recurse
+ * indefinitely.
+ */
+ this_domain = raw_cpu_read(ipipe_percpu.curr)->domain;
+ if (this_domain != ipipe_root_domain)
+ goto out;
+
+ /*
+ * Since we run on the root stage with hard irqs enabled, we
+ * need preemption to be disabled. Otherwise, our caller may
+ * end up accessing the wrong per-cpu variable instance due to
+ * CPU migration, complain loudly.
+ */
+ if (preempt_count() == 0 && !irqs_disabled())
+ bad = true;
+out:
+ hard_local_irq_restore_notrace(flags);
+
+ WARN_ON_ONCE(bad);
+
+ return __my_cpu_offset;
+}
+EXPORT_SYMBOL(__ipipe_cpu_get_offset);
+
+void __ipipe_spin_unlock_debug(unsigned long flags)
+{
+ /*
+ * We catch a nasty issue where spin_unlock_irqrestore() on a
+ * regular kernel spinlock is about to re-enable hw interrupts
+ * in a section entered with hw irqs off. This is clearly the
+ * sign of a massive breakage coming. Usual suspect is a
+ * regular spinlock which was overlooked, used within a
+ * section which must run with hw irqs disabled.
+ */
+ IPIPE_WARN_ONCE(!raw_irqs_disabled_flags(flags) && hard_irqs_disabled());
+}
+EXPORT_SYMBOL(__ipipe_spin_unlock_debug);
+
+#endif /* CONFIG_IPIPE_DEBUG_INTERNAL && CONFIG_SMP */
+
+void ipipe_prepare_panic(void)
+{
+#ifdef CONFIG_PRINTK
+ __ipipe_printk_bypass = 1;
+#endif
+ ipipe_context_check_off();
+}
+EXPORT_SYMBOL_GPL(ipipe_prepare_panic);
+
+static void __ipipe_do_work(unsigned int virq, void *cookie)
+{
+ struct ipipe_work_header *work;
+ unsigned long flags;
+ void *curr, *tail;
+ int cpu;
+
+ /*
+ * Work is dispatched in enqueuing order. This interrupt
+ * context can't migrate to another CPU.
+ */
+ cpu = smp_processor_id();
+ curr = per_cpu(work_buf, cpu);
+
+ for (;;) {
+ flags = hard_local_irq_save();
+ tail = per_cpu(work_tail, cpu);
+ if (curr == tail) {
+ per_cpu(work_tail, cpu) = per_cpu(work_buf, cpu);
+ hard_local_irq_restore(flags);
+ return;
+ }
+ work = curr;
+ curr += work->size;
+ hard_local_irq_restore(flags);
+ work->handler(work);
+ }
+}
+
+void __ipipe_post_work_root(struct ipipe_work_header *work)
+{
+ unsigned long flags;
+ void *tail;
+ int cpu;
+
+ /*
+ * Subtle: we want to use the head stall/unstall operators,
+ * not the hard_* routines to protect against races. This way,
+ * we ensure that a root-based caller will trigger the virq
+ * handling immediately when unstalling the head stage, as a
+ * result of calling __ipipe_sync_pipeline() under the hood.
+ */
+ flags = ipipe_test_and_stall_head();
+ cpu = ipipe_processor_id();
+ tail = per_cpu(work_tail, cpu);
+
+ if (unlikely((unsigned char *)tail + work->size >=
+ per_cpu(work_buf, cpu) + WORKBUF_SIZE)) {
+ static volatile unsigned long once;
+ if (!test_and_set_bit(0, &once)) {
+ ipipe_prepare_panic();
+ pr_err("I-pipe root work queue overflow! System may be unstable now.\n");
+ dump_stack();
+ }
+ goto out;
+ }
+
+ /* Work handling is deferred, so data has to be copied. */
+ memcpy(tail, work, work->size);
+ per_cpu(work_tail, cpu) = tail + work->size;
+ ipipe_post_irq_root(__ipipe_work_virq);
+out:
+ ipipe_restore_head(flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_post_work_root);
+
+void __weak __ipipe_arch_share_current(int flags)
+{
+}
+
+void __ipipe_share_current(int flags)
+{
+ ipipe_root_only();
+
+ __ipipe_arch_share_current(flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_share_current);
+
+bool __weak ipipe_cpuidle_control(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ /*
+ * By default, always deny entering sleep state if this
+ * entails stopping the timer (i.e. C3STOP misfeature),
+ * Xenomai could not deal with this case.
+ */
+ if (state && (state->flags & CPUIDLE_FLAG_TIMER_STOP))
+ return false;
+
+ /* Otherwise, allow switching to idle state. */
+ return true;
+}
+
+bool ipipe_enter_cpuidle(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ struct ipipe_percpu_domain_data *p;
+
+ WARN_ON_ONCE(!irqs_disabled());
+
+ hard_local_irq_disable();
+ p = ipipe_this_cpu_root_context();
+
+ /*
+ * Pending IRQ(s) waiting for delivery to the root stage, or
+ * the arbitrary decision of a co-kernel may deny the
+ * transition to a deeper C-state. Note that we return from
+ * this call with hard irqs off, so that we won't allow any
+ * interrupt to sneak into the IRQ log until we reach the
+ * processor idling code, or leave the CPU idle framework
+ * without sleeping.
+ */
+ return !__ipipe_ipending_p(p) && ipipe_cpuidle_control(dev, state);
+}
+
+#if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || defined(CONFIG_PROVE_LOCKING) || \
+ defined(CONFIG_PREEMPT_VOLUNTARY) || defined(CONFIG_IPIPE_DEBUG_CONTEXT)
+void __ipipe_uaccess_might_fault(void)
+{
+ struct ipipe_percpu_domain_data *pdd;
+ struct ipipe_domain *ipd;
+ unsigned long flags;
+
+ flags = hard_local_irq_save();
+ ipd = __ipipe_current_domain;
+ if (ipd == ipipe_root_domain) {
+ hard_local_irq_restore(flags);
+ might_fault();
+ return;
+ }
+
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+ pdd = ipipe_this_cpu_context(ipd);
+ WARN_ON_ONCE(hard_irqs_disabled_flags(flags)
+ || test_bit(IPIPE_STALL_FLAG, &pdd->status));
+#else /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+ (void)pdd;
+#endif /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+ hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_uaccess_might_fault);
+#endif
diff --git a/kernel/ipipe/timer.c b/kernel/ipipe/timer.c
new file mode 100644
index 000000000000..83b3c94d92b3
--- /dev/null
+++ b/kernel/ipipe/timer.c
@@ -0,0 +1,660 @@
+/* -*- linux-c -*-
+ * linux/kernel/ipipe/timer.c
+ *
+ * Copyright (C) 2012 Gilles Chanteperdrix
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * I-pipe timer request interface.
+ */
+#include <linux/ipipe.h>
+#include <linux/percpu.h>
+#include <linux/irqdesc.h>
+#include <linux/cpumask.h>
+#include <linux/spinlock.h>
+#include <linux/ipipe_tickdev.h>
+#include <linux/interrupt.h>
+#include <linux/export.h>
+
+unsigned long __ipipe_hrtimer_freq;
+
+static LIST_HEAD(timers);
+static IPIPE_DEFINE_SPINLOCK(lock);
+
+static DEFINE_PER_CPU(struct ipipe_timer *, percpu_timer);
+
+/*
+ * Default request method: switch to oneshot mode if supported.
+ */
+static void ipipe_timer_default_request(struct ipipe_timer *timer, int steal)
+{
+ struct clock_event_device *evtdev = timer->host_timer;
+
+ if (!(evtdev->features & CLOCK_EVT_FEAT_ONESHOT))
+ return;
+
+ if (clockevent_state_oneshot(evtdev) ||
+ clockevent_state_oneshot_stopped(evtdev))
+ timer->orig_mode = CLOCK_EVT_MODE_ONESHOT;
+ else {
+ if (clockevent_state_periodic(evtdev))
+ timer->orig_mode = CLOCK_EVT_MODE_PERIODIC;
+ else if (clockevent_state_shutdown(evtdev))
+ timer->orig_mode = CLOCK_EVT_MODE_SHUTDOWN;
+ else
+ timer->orig_mode = CLOCK_EVT_MODE_UNUSED;
+ evtdev->set_state_oneshot(evtdev);
+ evtdev->set_next_event(timer->freq / HZ, evtdev);
+ }
+}
+
+/*
+ * Default release method: return the timer to the mode it had when
+ * starting.
+ */
+static void ipipe_timer_default_release(struct ipipe_timer *timer)
+{
+ struct clock_event_device *evtdev = timer->host_timer;
+
+ switch (timer->orig_mode) {
+ case CLOCK_EVT_MODE_SHUTDOWN:
+ evtdev->set_state_shutdown(evtdev);
+ break;
+ case CLOCK_EVT_MODE_PERIODIC:
+ evtdev->set_state_periodic(evtdev);
+ /* fall through */
+ case CLOCK_EVT_MODE_ONESHOT:
+ evtdev->set_next_event(timer->freq / HZ, evtdev);
+ break;
+ }
+}
+
+static int get_dev_mode(struct clock_event_device *evtdev)
+{
+ if (clockevent_state_oneshot(evtdev) ||
+ clockevent_state_oneshot_stopped(evtdev))
+ return CLOCK_EVT_MODE_ONESHOT;
+
+ if (clockevent_state_periodic(evtdev))
+ return CLOCK_EVT_MODE_PERIODIC;
+
+ if (clockevent_state_shutdown(evtdev))
+ return CLOCK_EVT_MODE_SHUTDOWN;
+
+ return CLOCK_EVT_MODE_UNUSED;
+}
+
+void ipipe_host_timer_register(struct clock_event_device *evtdev)
+{
+ struct ipipe_timer *timer = evtdev->ipipe_timer;
+
+ if (timer == NULL)
+ return;
+
+ timer->orig_mode = CLOCK_EVT_MODE_UNUSED;
+
+ if (timer->request == NULL)
+ timer->request = ipipe_timer_default_request;
+
+ /*
+ * By default, use the same method as linux timer, on ARM at
+ * least, most set_next_event methods are safe to be called
+ * from Xenomai domain anyway.
+ */
+ if (timer->set == NULL) {
+ timer->timer_set = evtdev;
+ timer->set = (typeof(timer->set))evtdev->set_next_event;
+ }
+
+ if (timer->release == NULL)
+ timer->release = ipipe_timer_default_release;
+
+ if (timer->name == NULL)
+ timer->name = evtdev->name;
+
+ if (timer->rating == 0)
+ timer->rating = evtdev->rating;
+
+ timer->freq = (1000000000ULL * evtdev->mult) >> evtdev->shift;
+
+ if (timer->min_delay_ticks == 0)
+ timer->min_delay_ticks =
+ (evtdev->min_delta_ns * evtdev->mult) >> evtdev->shift;
+
+ if (timer->max_delay_ticks == 0)
+ timer->max_delay_ticks =
+ (evtdev->max_delta_ns * evtdev->mult) >> evtdev->shift;
+
+ if (timer->cpumask == NULL)
+ timer->cpumask = evtdev->cpumask;
+
+ timer->host_timer = evtdev;
+
+ ipipe_timer_register(timer);
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+void ipipe_host_timer_cleanup(struct clock_event_device *evtdev)
+{
+ struct ipipe_timer *timer = evtdev->ipipe_timer;
+ unsigned long flags;
+
+ if (timer == NULL)
+ return;
+
+ raw_spin_lock_irqsave(&lock, flags);
+ list_del(&timer->link);
+ raw_spin_unlock_irqrestore(&lock, flags);
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
+/*
+ * register a timer: maintain them in a list sorted by rating
+ */
+void ipipe_timer_register(struct ipipe_timer *timer)
+{
+ struct ipipe_timer *t;
+ unsigned long flags;
+
+ if (timer->timer_set == NULL)
+ timer->timer_set = timer;
+
+ if (timer->cpumask == NULL)
+ timer->cpumask = cpumask_of(smp_processor_id());
+
+ raw_spin_lock_irqsave(&lock, flags);
+
+ list_for_each_entry(t, &timers, link) {
+ if (t->rating <= timer->rating) {
+ __list_add(&timer->link, t->link.prev, &t->link);
+ goto done;
+ }
+ }
+ list_add_tail(&timer->link, &timers);
+ done:
+ raw_spin_unlock_irqrestore(&lock, flags);
+}
+
+static void ipipe_timer_request_sync(void)
+{
+ struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+ struct clock_event_device *evtdev;
+ int steal;
+
+ if (!timer)
+ return;
+
+ evtdev = timer->host_timer;
+ steal = evtdev != NULL && !clockevent_state_detached(evtdev);
+ timer->request(timer, steal);
+}
+
+static void config_pcpu_timer(struct ipipe_timer *t, unsigned hrclock_freq)
+{
+ unsigned long long tmp;
+ unsigned hrtimer_freq;
+
+ if (__ipipe_hrtimer_freq != t->freq)
+ __ipipe_hrtimer_freq = t->freq;
+
+ hrtimer_freq = t->freq;
+ if (__ipipe_hrclock_freq > UINT_MAX)
+ hrtimer_freq /= 1000;
+
+ t->c2t_integ = hrtimer_freq / hrclock_freq;
+ tmp = (((unsigned long long)
+ (hrtimer_freq % hrclock_freq)) << 32)
+ + hrclock_freq - 1;
+ do_div(tmp, hrclock_freq);
+ t->c2t_frac = tmp;
+}
+
+/* Set up a timer as per-cpu timer for ipipe */
+static void install_pcpu_timer(unsigned cpu, unsigned hrclock_freq,
+ struct ipipe_timer *t)
+{
+ per_cpu(ipipe_percpu.hrtimer_irq, cpu) = t->irq;
+ per_cpu(percpu_timer, cpu) = t;
+ config_pcpu_timer(t, hrclock_freq);
+}
+
+static void select_root_only_timer(unsigned cpu, unsigned hrclock_khz,
+ const struct cpumask *mask,
+ struct ipipe_timer *t) {
+ unsigned icpu;
+ struct clock_event_device *evtdev;
+
+ /*
+ * If no ipipe-supported CPU shares an interrupt with the
+ * timer, we do not need to care about it.
+ */
+ for_each_cpu(icpu, mask) {
+ if (t->irq == per_cpu(ipipe_percpu.hrtimer_irq, icpu)) {
+ evtdev = t->host_timer;
+ if (evtdev && clockevent_state_shutdown(evtdev))
+ continue;
+ goto found;
+ }
+ }
+
+ return;
+
+found:
+ install_pcpu_timer(cpu, hrclock_khz, t);
+}
+
+/*
+ * Choose per-cpu timers with the highest rating by traversing the
+ * rating-sorted list for each CPU.
+ */
+int ipipe_select_timers(const struct cpumask *mask)
+{
+ unsigned hrclock_freq;
+ unsigned long long tmp;
+ struct ipipe_timer *t;
+ struct clock_event_device *evtdev;
+ unsigned long flags;
+ unsigned cpu;
+ cpumask_var_t fixup;
+
+ if (!__ipipe_hrclock_ok()) {
+ printk("I-pipe: high-resolution clock not working\n");
+ return -ENODEV;
+ }
+
+ if (__ipipe_hrclock_freq > UINT_MAX) {
+ tmp = __ipipe_hrclock_freq;
+ do_div(tmp, 1000);
+ hrclock_freq = tmp;
+ } else
+ hrclock_freq = __ipipe_hrclock_freq;
+
+
+ if (!zalloc_cpumask_var(&fixup, GFP_KERNEL)) {
+ WARN_ON(1);
+ return -ENODEV;
+ }
+
+ raw_spin_lock_irqsave(&lock, flags);
+
+ /* First, choose timers for the CPUs handled by ipipe */
+ for_each_cpu(cpu, mask) {
+ list_for_each_entry(t, &timers, link) {
+ if (!cpumask_test_cpu(cpu, t->cpumask))
+ continue;
+
+ evtdev = t->host_timer;
+ if (evtdev && clockevent_state_shutdown(evtdev))
+ continue;
+ goto found;
+ }
+
+ printk("I-pipe: could not find timer for cpu #%d\n",
+ cpu);
+ goto err_remove_all;
+found:
+ install_pcpu_timer(cpu, hrclock_freq, t);
+ }
+
+ /*
+ * Second, check if we need to fix up any CPUs not supported
+ * by ipipe (but by Linux) whose interrupt may need to be
+ * forwarded because they have the same IRQ as an ipipe-enabled
+ * timer.
+ */
+ cpumask_andnot(fixup, cpu_online_mask, mask);
+
+ for_each_cpu(cpu, fixup) {
+ list_for_each_entry(t, &timers, link) {
+ if (!cpumask_test_cpu(cpu, t->cpumask))
+ continue;
+
+ select_root_only_timer(cpu, hrclock_freq, mask, t);
+ }
+ }
+
+ raw_spin_unlock_irqrestore(&lock, flags);
+
+ free_cpumask_var(fixup);
+ flags = ipipe_critical_enter(ipipe_timer_request_sync);
+ ipipe_timer_request_sync();
+ ipipe_critical_exit(flags);
+
+ return 0;
+
+err_remove_all:
+ raw_spin_unlock_irqrestore(&lock, flags);
+ free_cpumask_var(fixup);
+
+ for_each_cpu(cpu, mask) {
+ per_cpu(ipipe_percpu.hrtimer_irq, cpu) = -1;
+ per_cpu(percpu_timer, cpu) = NULL;
+ }
+ __ipipe_hrtimer_freq = 0;
+
+ return -ENODEV;
+}
+
+static void ipipe_timer_release_sync(void)
+{
+ struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+ if (timer)
+ timer->release(timer);
+}
+
+void ipipe_timers_release(void)
+{
+ unsigned long flags;
+ unsigned cpu;
+
+ flags = ipipe_critical_enter(ipipe_timer_release_sync);
+ ipipe_timer_release_sync();
+ ipipe_critical_exit(flags);
+
+ for_each_online_cpu(cpu) {
+ per_cpu(ipipe_percpu.hrtimer_irq, cpu) = -1;
+ per_cpu(percpu_timer, cpu) = NULL;
+ __ipipe_hrtimer_freq = 0;
+ }
+}
+
+static void __ipipe_ack_hrtimer_irq(struct irq_desc *desc)
+{
+ struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+ /*
+ * Pseudo-IRQs like pipelined IPIs have no descriptor, we have
+ * to check for this.
+ */
+ if (desc)
+ desc->ipipe_ack(desc);
+
+ if (timer->ack)
+ timer->ack();
+
+ if (desc)
+ desc->ipipe_end(desc);
+}
+
+static int do_set_oneshot(struct clock_event_device *cdev)
+{
+ struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+ timer->orig_set_state_oneshot(cdev);
+ timer->mode_handler(CLOCK_EVT_MODE_ONESHOT, cdev);
+
+ return 0;
+}
+
+static int do_set_oneshot_stopped(struct clock_event_device *cdev)
+{
+ struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+ timer->mode_handler(CLOCK_EVT_MODE_SHUTDOWN, cdev);
+
+ return 0;
+}
+
+static int do_set_periodic(struct clock_event_device *cdev)
+{
+ struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+ timer->mode_handler(CLOCK_EVT_MODE_PERIODIC, cdev);
+
+ return 0;
+}
+
+static int do_set_shutdown(struct clock_event_device *cdev)
+{
+ struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+ timer->mode_handler(CLOCK_EVT_MODE_SHUTDOWN, cdev);
+
+ return 0;
+}
+
+int clockevents_program_event(struct clock_event_device *dev,
+ ktime_t expires, bool force);
+
+struct grab_timer_data {
+ void (*tick_handler)(void);
+ void (*emumode)(enum clock_event_mode mode,
+ struct clock_event_device *cdev);
+ int (*emutick)(unsigned long evt,
+ struct clock_event_device *cdev);
+ int retval;
+};
+
+static void grab_timer(void *arg)
+{
+ struct grab_timer_data *data = arg;
+ struct clock_event_device *evtdev;
+ struct ipipe_timer *timer;
+ struct irq_desc *desc;
+ unsigned long flags;
+ int steal, ret;
+
+ flags = hard_local_irq_save();
+
+ timer = this_cpu_read(percpu_timer);
+ evtdev = timer->host_timer;
+ ret = ipipe_request_irq(ipipe_head_domain, timer->irq,
+ (ipipe_irq_handler_t)data->tick_handler,
+ NULL, __ipipe_ack_hrtimer_irq);
+ if (ret < 0 && ret != -EBUSY) {
+ hard_local_irq_restore(flags);
+ data->retval = ret;
+ return;
+ }
+
+ steal = !clockevent_state_detached(evtdev);
+ if (steal && evtdev->ipipe_stolen == 0) {
+ timer->real_mult = evtdev->mult;
+ timer->real_shift = evtdev->shift;
+ timer->orig_set_state_periodic = evtdev->set_state_periodic;
+ timer->orig_set_state_oneshot = evtdev->set_state_oneshot;
+ timer->orig_set_state_oneshot_stopped = evtdev->set_state_oneshot_stopped;
+ timer->orig_set_state_shutdown = evtdev->set_state_shutdown;
+ timer->orig_set_next_event = evtdev->set_next_event;
+ timer->mode_handler = data->emumode;
+ evtdev->mult = 1;
+ evtdev->shift = 0;
+ evtdev->max_delta_ns = UINT_MAX;
+ if (timer->orig_set_state_periodic)
+ evtdev->set_state_periodic = do_set_periodic;
+ if (timer->orig_set_state_oneshot)
+ evtdev->set_state_oneshot = do_set_oneshot;
+ if (timer->orig_set_state_oneshot_stopped)
+ evtdev->set_state_oneshot_stopped = do_set_oneshot_stopped;
+ if (timer->orig_set_state_shutdown)
+ evtdev->set_state_shutdown = do_set_shutdown;
+ evtdev->set_next_event = data->emutick;
+ evtdev->ipipe_stolen = 1;
+ }
+
+ hard_local_irq_restore(flags);
+
+ data->retval = get_dev_mode(evtdev);
+
+ desc = irq_to_desc(timer->irq);
+ if (desc && irqd_irq_disabled(&desc->irq_data))
+ ipipe_enable_irq(timer->irq);
+
+ if (evtdev->ipipe_stolen && clockevent_state_oneshot(evtdev)) {
+ ret = clockevents_program_event(evtdev,
+ evtdev->next_event, true);
+ if (ret)
+ data->retval = ret;
+ }
+}
+
+int ipipe_timer_start(void (*tick_handler)(void),
+ void (*emumode)(enum clock_event_mode mode,
+ struct clock_event_device *cdev),
+ int (*emutick)(unsigned long evt,
+ struct clock_event_device *cdev),
+ unsigned int cpu)
+{
+ struct grab_timer_data data;
+ int ret;
+
+ data.tick_handler = tick_handler;
+ data.emutick = emutick;
+ data.emumode = emumode;
+ data.retval = -EINVAL;
+ ret = smp_call_function_single(cpu, grab_timer, &data, true);
+
+ return ret ?: data.retval;
+}
+
+static void release_timer(void *arg)
+{
+ struct clock_event_device *evtdev;
+ struct ipipe_timer *timer;
+ struct irq_desc *desc;
+ unsigned long flags;
+
+ flags = hard_local_irq_save();
+
+ timer = this_cpu_read(percpu_timer);
+
+ desc = irq_to_desc(timer->irq);
+ if (desc && irqd_irq_disabled(&desc->irq_data))
+ ipipe_disable_irq(timer->irq);
+
+ ipipe_free_irq(ipipe_head_domain, timer->irq);
+
+ evtdev = timer->host_timer;
+ if (evtdev && evtdev->ipipe_stolen) {
+ evtdev->mult = timer->real_mult;
+ evtdev->shift = timer->real_shift;
+ evtdev->set_state_periodic = timer->orig_set_state_periodic;
+ evtdev->set_state_oneshot = timer->orig_set_state_oneshot;
+ evtdev->set_state_oneshot_stopped = timer->orig_set_state_oneshot_stopped;
+ evtdev->set_state_shutdown = timer->orig_set_state_shutdown;
+ evtdev->set_next_event = timer->orig_set_next_event;
+ evtdev->ipipe_stolen = 0;
+ hard_local_irq_restore(flags);
+ if (clockevent_state_oneshot(evtdev))
+ clockevents_program_event(evtdev,
+ evtdev->next_event, true);
+ } else
+ hard_local_irq_restore(flags);
+}
+
+void ipipe_timer_stop(unsigned int cpu)
+{
+ smp_call_function_single(cpu, release_timer, NULL, true);
+}
+
+void ipipe_timer_set(unsigned long cdelay)
+{
+ unsigned long tdelay;
+ struct ipipe_timer *t;
+
+ t = __ipipe_raw_cpu_read(percpu_timer);
+
+ /*
+ * Even though some architectures may use a 64 bits delay
+ * here, we voluntarily limit to 32 bits, 4 billions ticks
+ * should be enough for now. Would a timer needs more, an
+ * extra call to the tick handler would simply occur after 4
+ * billions ticks.
+ */
+// if (cdelay > UINT_MAX)
+// cdelay = UINT_MAX;
+
+ tdelay = cdelay;
+ if (t->c2t_integ != 1)
+ tdelay *= t->c2t_integ;
+ if (t->c2t_frac)
+ tdelay += ((unsigned long long)cdelay * t->c2t_frac) >> 32;
+ if (tdelay < t->min_delay_ticks)
+ tdelay = t->min_delay_ticks;
+ if (tdelay > t->max_delay_ticks)
+ tdelay = t->max_delay_ticks;
+
+ t->set(tdelay, t->timer_set); return; if (t->set(tdelay, t->timer_set) < 0)
+ ipipe_raise_irq(t->irq);
+}
+EXPORT_SYMBOL_GPL(ipipe_timer_set);
+
+const char *ipipe_timer_name(void)
+{
+ return per_cpu(percpu_timer, 0)->name;
+}
+EXPORT_SYMBOL_GPL(ipipe_timer_name);
+
+unsigned ipipe_timer_ns2ticks(struct ipipe_timer *timer, unsigned ns)
+{
+ unsigned long long tmp;
+ BUG_ON(!timer->freq);
+ tmp = (unsigned long long)ns * timer->freq;
+ do_div(tmp, 1000000000);
+ return tmp;
+}
+
+#ifdef CONFIG_IPIPE_HAVE_HOSTRT
+/* NOTE: The event receiver is responsible for providing proper locking. */
+void ipipe_update_hostrt(struct timekeeper *tk)
+{
+ struct tk_read_base *tkr = &tk->tkr_mono;
+ struct clocksource *clock = tkr->clock;
+ struct ipipe_hostrt_data data;
+ struct timespec xt;
+
+ if (clock != &__ipipe_hostrt_clock)
+ return;
+
+ xt.tv_sec = tk->xtime_sec;
+ xt.tv_nsec = (long)(tkr->xtime_nsec >> tkr->shift);
+ ipipe_root_only();
+ data.live = 1;
+ data.cycle_last = tkr->cycle_last;
+ data.mask = clock->mask;
+ data.mult = tkr->mult;
+ data.shift = tkr->shift;
+ data.wall_time_sec = xt.tv_sec;
+ data.wall_time_nsec = xt.tv_nsec;
+ data.wall_to_monotonic.tv_sec = tk->wall_to_monotonic.tv_sec;
+ data.wall_to_monotonic.tv_nsec = tk->wall_to_monotonic.tv_nsec;
+ __ipipe_notify_kevent(IPIPE_KEVT_HOSTRT, &data);
+}
+
+#endif /* CONFIG_IPIPE_HAVE_HOSTRT */
+
+int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
+ bool force);
+
+void __ipipe_timer_refresh_freq(unsigned int hrclock_freq)
+{
+ struct ipipe_timer *t = __ipipe_raw_cpu_read(percpu_timer);
+ unsigned long flags;
+
+ if (t && t->refresh_freq) {
+ t->freq = t->refresh_freq();
+ flags = hard_local_irq_save();
+ config_pcpu_timer(t, hrclock_freq);
+ hard_local_irq_restore(flags);
+ clockevents_program_event(t->host_timer,
+ t->host_timer->next_event, false);
+ }
+}
+EXPORT_SYMBOL(ipipe_select_timers);
+EXPORT_SYMBOL(ipipe_timers_release);
+EXPORT_SYMBOL(ipipe_timer_start);
+EXPORT_SYMBOL(ipipe_timer_stop);
diff --git a/kernel/ipipe/tracer.c b/kernel/ipipe/tracer.c
new file mode 100644
index 000000000000..181d4df20a6a
--- /dev/null
+++ b/kernel/ipipe/tracer.c
@@ -0,0 +1,1524 @@
+/* -*- linux-c -*-
+ * kernel/ipipe/tracer.c
+ *
+ * Copyright (C) 2005 Luotao Fu.
+ * 2005-2008 Jan Kiszka.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/kallsyms.h>
+#include <linux/kdebug.h>
+#include <linux/seq_file.h>
+#include <linux/proc_fs.h>
+#include <linux/ctype.h>
+#include <linux/vmalloc.h>
+#include <linux/pid.h>
+#include <linux/vermagic.h>
+#include <linux/sched.h>
+#include <linux/ipipe.h>
+#include <linux/ftrace.h>
+#include <linux/uaccess.h>
+
+#define IPIPE_TRACE_PATHS 4 /* <!> Do not lower below 3 */
+#define IPIPE_DEFAULT_ACTIVE 0
+#define IPIPE_DEFAULT_MAX 1
+#define IPIPE_DEFAULT_FROZEN 2
+
+#define IPIPE_TRACE_POINTS (1 << CONFIG_IPIPE_TRACE_SHIFT)
+#define WRAP_POINT_NO(point) ((point) & (IPIPE_TRACE_POINTS-1))
+
+#define IPIPE_DEFAULT_PRE_TRACE 10
+#define IPIPE_DEFAULT_POST_TRACE 10
+#define IPIPE_DEFAULT_BACK_TRACE 100
+
+#define IPIPE_DELAY_NOTE 1000 /* in nanoseconds */
+#define IPIPE_DELAY_WARN 10000 /* in nanoseconds */
+
+#define IPIPE_TFLG_NMI_LOCK 0x0001
+#define IPIPE_TFLG_NMI_HIT 0x0002
+#define IPIPE_TFLG_NMI_FREEZE_REQ 0x0004
+
+#define IPIPE_TFLG_HWIRQ_OFF 0x0100
+#define IPIPE_TFLG_FREEZING 0x0200
+#define IPIPE_TFLG_CURRDOM_SHIFT 10 /* bits 10..11: current domain */
+#define IPIPE_TFLG_CURRDOM_MASK 0x0C00
+#define IPIPE_TFLG_DOMSTATE_SHIFT 12 /* bits 12..15: domain stalled? */
+#define IPIPE_TFLG_DOMSTATE_BITS 1
+
+#define IPIPE_TFLG_DOMAIN_STALLED(point, n) \
+ (point->flags & (1 << (n + IPIPE_TFLG_DOMSTATE_SHIFT)))
+#define IPIPE_TFLG_CURRENT_DOMAIN(point) \
+ ((point->flags & IPIPE_TFLG_CURRDOM_MASK) >> IPIPE_TFLG_CURRDOM_SHIFT)
+
+struct ipipe_trace_point {
+ short type;
+ short flags;
+ unsigned long eip;
+ unsigned long parent_eip;
+ unsigned long v;
+ unsigned long long timestamp;
+};
+
+struct ipipe_trace_path {
+ volatile int flags;
+ int dump_lock; /* separated from flags due to cross-cpu access */
+ int trace_pos; /* next point to fill */
+ int begin, end; /* finalised path begin and end */
+ int post_trace; /* non-zero when in post-trace phase */
+ unsigned long long length; /* max path length in cycles */
+ unsigned long nmi_saved_eip; /* for deferred requests from NMIs */
+ unsigned long nmi_saved_parent_eip;
+ unsigned long nmi_saved_v;
+ struct ipipe_trace_point point[IPIPE_TRACE_POINTS];
+} ____cacheline_aligned_in_smp;
+
+enum ipipe_trace_type
+{
+ IPIPE_TRACE_FUNC = 0,
+ IPIPE_TRACE_BEGIN,
+ IPIPE_TRACE_END,
+ IPIPE_TRACE_FREEZE,
+ IPIPE_TRACE_SPECIAL,
+ IPIPE_TRACE_PID,
+ IPIPE_TRACE_EVENT,
+};
+
+#define IPIPE_TYPE_MASK 0x0007
+#define IPIPE_TYPE_BITS 3
+
+#ifdef CONFIG_IPIPE_TRACE_VMALLOC
+static DEFINE_PER_CPU(struct ipipe_trace_path *, trace_path);
+#else /* !CONFIG_IPIPE_TRACE_VMALLOC */
+static DEFINE_PER_CPU(struct ipipe_trace_path, trace_path[IPIPE_TRACE_PATHS]) =
+ { [0 ... IPIPE_TRACE_PATHS-1] = { .begin = -1, .end = -1 } };
+#endif /* CONFIG_IPIPE_TRACE_VMALLOC */
+
+int ipipe_trace_enable = 0;
+
+static DEFINE_PER_CPU(int, active_path) = { IPIPE_DEFAULT_ACTIVE };
+static DEFINE_PER_CPU(int, max_path) = { IPIPE_DEFAULT_MAX };
+static DEFINE_PER_CPU(int, frozen_path) = { IPIPE_DEFAULT_FROZEN };
+static IPIPE_DEFINE_SPINLOCK(global_path_lock);
+static int pre_trace = IPIPE_DEFAULT_PRE_TRACE;
+static int post_trace = IPIPE_DEFAULT_POST_TRACE;
+static int back_trace = IPIPE_DEFAULT_BACK_TRACE;
+static int verbose_trace = 1;
+static unsigned long trace_overhead;
+
+static unsigned long trigger_begin;
+static unsigned long trigger_end;
+
+static DEFINE_MUTEX(out_mutex);
+static struct ipipe_trace_path *print_path;
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+static struct ipipe_trace_path *panic_path;
+#endif /* CONFIG_IPIPE_TRACE_PANIC */
+static int print_pre_trace;
+static int print_post_trace;
+
+
+static long __ipipe_signed_tsc2us(long long tsc);
+static void
+__ipipe_trace_point_type(char *buf, struct ipipe_trace_point *point);
+static void __ipipe_print_symname(struct seq_file *m, unsigned long eip);
+
+static inline void store_states(struct ipipe_domain *ipd,
+ struct ipipe_trace_point *point, int pos)
+{
+ if (test_bit(IPIPE_STALL_FLAG, &ipipe_this_cpu_context(ipd)->status))
+ point->flags |= 1 << (pos + IPIPE_TFLG_DOMSTATE_SHIFT);
+
+ if (ipd == __ipipe_current_domain)
+ point->flags |= pos << IPIPE_TFLG_CURRDOM_SHIFT;
+}
+
+static notrace void
+__ipipe_store_domain_states(struct ipipe_trace_point *point)
+{
+ store_states(ipipe_root_domain, point, 0);
+ if (ipipe_head_domain != ipipe_root_domain)
+ store_states(ipipe_head_domain, point, 1);
+}
+
+static notrace int __ipipe_get_free_trace_path(int old, int cpu)
+{
+ int new_active = old;
+ struct ipipe_trace_path *tp;
+
+ do {
+ if (++new_active == IPIPE_TRACE_PATHS)
+ new_active = 0;
+ tp = &per_cpu(trace_path, cpu)[new_active];
+ } while (new_active == per_cpu(max_path, cpu) ||
+ new_active == per_cpu(frozen_path, cpu) ||
+ tp->dump_lock);
+
+ return new_active;
+}
+
+static notrace void
+__ipipe_migrate_pre_trace(struct ipipe_trace_path *new_tp,
+ struct ipipe_trace_path *old_tp, int old_pos)
+{
+ int i;
+
+ new_tp->trace_pos = pre_trace+1;
+
+ for (i = new_tp->trace_pos; i > 0; i--)
+ memcpy(&new_tp->point[WRAP_POINT_NO(new_tp->trace_pos-i)],
+ &old_tp->point[WRAP_POINT_NO(old_pos-i)],
+ sizeof(struct ipipe_trace_point));
+
+ /* mark the end (i.e. the point before point[0]) invalid */
+ new_tp->point[IPIPE_TRACE_POINTS-1].eip = 0;
+}
+
+static notrace struct ipipe_trace_path *
+__ipipe_trace_end(int cpu, struct ipipe_trace_path *tp, int pos)
+{
+ struct ipipe_trace_path *old_tp = tp;
+ long active = per_cpu(active_path, cpu);
+ unsigned long long length;
+
+ /* do we have a new worst case? */
+ length = tp->point[tp->end].timestamp -
+ tp->point[tp->begin].timestamp;
+ if (length > per_cpu(trace_path, cpu)[per_cpu(max_path, cpu)].length) {
+ /* we need protection here against other cpus trying
+ to start a proc dump */
+ raw_spin_lock(&global_path_lock);
+
+ /* active path holds new worst case */
+ tp->length = length;
+ per_cpu(max_path, cpu) = active;
+
+ /* find next unused trace path */
+ active = __ipipe_get_free_trace_path(active, cpu);
+
+ raw_spin_unlock(&global_path_lock);
+
+ tp = &per_cpu(trace_path, cpu)[active];
+
+ /* migrate last entries for pre-tracing */
+ __ipipe_migrate_pre_trace(tp, old_tp, pos);
+ }
+
+ return tp;
+}
+
+static notrace struct ipipe_trace_path *
+__ipipe_trace_freeze(int cpu, struct ipipe_trace_path *tp, int pos)
+{
+ struct ipipe_trace_path *old_tp = tp;
+ long active = per_cpu(active_path, cpu);
+ int n;
+
+ /* frozen paths have no core (begin=end) */
+ tp->begin = tp->end;
+
+ /* we need protection here against other cpus trying
+ * to set their frozen path or to start a proc dump */
+ raw_spin_lock(&global_path_lock);
+
+ per_cpu(frozen_path, cpu) = active;
+
+ /* find next unused trace path */
+ active = __ipipe_get_free_trace_path(active, cpu);
+
+ /* check if this is the first frozen path */
+ for_each_possible_cpu(n) {
+ if (n != cpu &&
+ per_cpu(trace_path, n)[per_cpu(frozen_path, n)].end >= 0)
+ tp->end = -1;
+ }
+
+ raw_spin_unlock(&global_path_lock);
+
+ tp = &per_cpu(trace_path, cpu)[active];
+
+ /* migrate last entries for pre-tracing */
+ __ipipe_migrate_pre_trace(tp, old_tp, pos);
+
+ return tp;
+}
+
+void notrace
+__ipipe_trace(enum ipipe_trace_type type, unsigned long eip,
+ unsigned long parent_eip, unsigned long v)
+{
+ struct ipipe_trace_path *tp, *old_tp;
+ int pos, next_pos, begin;
+ struct ipipe_trace_point *point;
+ unsigned long flags;
+ int cpu;
+
+ flags = hard_local_irq_save_notrace();
+
+ cpu = ipipe_processor_id();
+ restart:
+ tp = old_tp = &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)];
+
+ /* here starts a race window with NMIs - catched below */
+
+ /* check for NMI recursion */
+ if (unlikely(tp->flags & IPIPE_TFLG_NMI_LOCK)) {
+ tp->flags |= IPIPE_TFLG_NMI_HIT;
+
+ /* first freeze request from NMI context? */
+ if ((type == IPIPE_TRACE_FREEZE) &&
+ !(tp->flags & IPIPE_TFLG_NMI_FREEZE_REQ)) {
+ /* save arguments and mark deferred freezing */
+ tp->flags |= IPIPE_TFLG_NMI_FREEZE_REQ;
+ tp->nmi_saved_eip = eip;
+ tp->nmi_saved_parent_eip = parent_eip;
+ tp->nmi_saved_v = v;
+ }
+ return; /* no need for restoring flags inside IRQ */
+ }
+
+ /* clear NMI events and set lock (atomically per cpu) */
+ tp->flags = (tp->flags & ~(IPIPE_TFLG_NMI_HIT |
+ IPIPE_TFLG_NMI_FREEZE_REQ))
+ | IPIPE_TFLG_NMI_LOCK;
+
+ /* check active_path again - some nasty NMI may have switched
+ * it meanwhile */
+ if (unlikely(tp !=
+ &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)])) {
+ /* release lock on wrong path and restart */
+ tp->flags &= ~IPIPE_TFLG_NMI_LOCK;
+
+ /* there is no chance that the NMI got deferred
+ * => no need to check for pending freeze requests */
+ goto restart;
+ }
+
+ /* get the point buffer */
+ pos = tp->trace_pos;
+ point = &tp->point[pos];
+
+ /* store all trace point data */
+ point->type = type;
+ point->flags = hard_irqs_disabled_flags(flags) ? IPIPE_TFLG_HWIRQ_OFF : 0;
+ point->eip = eip;
+ point->parent_eip = parent_eip;
+ point->v = v;
+ ipipe_read_tsc(point->timestamp);
+
+ __ipipe_store_domain_states(point);
+
+ /* forward to next point buffer */
+ next_pos = WRAP_POINT_NO(pos+1);
+ tp->trace_pos = next_pos;
+
+ /* only mark beginning if we haven't started yet */
+ begin = tp->begin;
+ if (unlikely(type == IPIPE_TRACE_BEGIN) && (begin < 0))
+ tp->begin = pos;
+
+ /* end of critical path, start post-trace if not already started */
+ if (unlikely(type == IPIPE_TRACE_END) &&
+ (begin >= 0) && !tp->post_trace)
+ tp->post_trace = post_trace + 1;
+
+ /* freeze only if the slot is free and we are not already freezing */
+ if ((unlikely(type == IPIPE_TRACE_FREEZE) ||
+ (unlikely(eip >= trigger_begin && eip <= trigger_end) &&
+ type == IPIPE_TRACE_FUNC)) &&
+ per_cpu(trace_path, cpu)[per_cpu(frozen_path, cpu)].begin < 0 &&
+ !(tp->flags & IPIPE_TFLG_FREEZING)) {
+ tp->post_trace = post_trace + 1;
+ tp->flags |= IPIPE_TFLG_FREEZING;
+ }
+
+ /* enforce end of trace in case of overflow */
+ if (unlikely(WRAP_POINT_NO(next_pos + 1) == begin)) {
+ tp->end = pos;
+ goto enforce_end;
+ }
+
+ /* stop tracing this path if we are in post-trace and
+ * a) that phase is over now or
+ * b) a new TRACE_BEGIN came in but we are not freezing this path */
+ if (unlikely((tp->post_trace > 0) && ((--tp->post_trace == 0) ||
+ ((type == IPIPE_TRACE_BEGIN) &&
+ !(tp->flags & IPIPE_TFLG_FREEZING))))) {
+ /* store the path's end (i.e. excluding post-trace) */
+ tp->end = WRAP_POINT_NO(pos - post_trace + tp->post_trace);
+
+ enforce_end:
+ if (tp->flags & IPIPE_TFLG_FREEZING)
+ tp = __ipipe_trace_freeze(cpu, tp, pos);
+ else
+ tp = __ipipe_trace_end(cpu, tp, pos);
+
+ /* reset the active path, maybe already start a new one */
+ tp->begin = (type == IPIPE_TRACE_BEGIN) ?
+ WRAP_POINT_NO(tp->trace_pos - 1) : -1;
+ tp->end = -1;
+ tp->post_trace = 0;
+ tp->flags = 0;
+
+ /* update active_path not earlier to avoid races with NMIs */
+ per_cpu(active_path, cpu) = tp - per_cpu(trace_path, cpu);
+ }
+
+ /* we still have old_tp and point,
+ * let's reset NMI lock and check for catches */
+ old_tp->flags &= ~IPIPE_TFLG_NMI_LOCK;
+ if (unlikely(old_tp->flags & IPIPE_TFLG_NMI_HIT)) {
+ /* well, this late tagging may not immediately be visible for
+ * other cpus already dumping this path - a minor issue */
+ point->flags |= IPIPE_TFLG_NMI_HIT;
+
+ /* handle deferred freezing from NMI context */
+ if (old_tp->flags & IPIPE_TFLG_NMI_FREEZE_REQ)
+ __ipipe_trace(IPIPE_TRACE_FREEZE, old_tp->nmi_saved_eip,
+ old_tp->nmi_saved_parent_eip,
+ old_tp->nmi_saved_v);
+ }
+
+ hard_local_irq_restore_notrace(flags);
+}
+
+static unsigned long __ipipe_global_path_lock(void)
+{
+ unsigned long flags;
+ int cpu;
+ struct ipipe_trace_path *tp;
+
+ raw_spin_lock_irqsave(&global_path_lock, flags);
+
+ cpu = ipipe_processor_id();
+ restart:
+ tp = &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)];
+
+ /* here is small race window with NMIs - catched below */
+
+ /* clear NMI events and set lock (atomically per cpu) */
+ tp->flags = (tp->flags & ~(IPIPE_TFLG_NMI_HIT |
+ IPIPE_TFLG_NMI_FREEZE_REQ))
+ | IPIPE_TFLG_NMI_LOCK;
+
+ /* check active_path again - some nasty NMI may have switched
+ * it meanwhile */
+ if (tp != &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)]) {
+ /* release lock on wrong path and restart */
+ tp->flags &= ~IPIPE_TFLG_NMI_LOCK;
+
+ /* there is no chance that the NMI got deferred
+ * => no need to check for pending freeze requests */
+ goto restart;
+ }
+
+ return flags;
+}
+
+static void __ipipe_global_path_unlock(unsigned long flags)
+{
+ int cpu;
+ struct ipipe_trace_path *tp;
+
+ /* release spinlock first - it's not involved in the NMI issue */
+ __ipipe_spin_unlock_irqbegin(&global_path_lock);
+
+ cpu = ipipe_processor_id();
+ tp = &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)];
+
+ tp->flags &= ~IPIPE_TFLG_NMI_LOCK;
+
+ /* handle deferred freezing from NMI context */
+ if (tp->flags & IPIPE_TFLG_NMI_FREEZE_REQ)
+ __ipipe_trace(IPIPE_TRACE_FREEZE, tp->nmi_saved_eip,
+ tp->nmi_saved_parent_eip, tp->nmi_saved_v);
+
+ /* See __ipipe_spin_lock_irqsave() and friends. */
+ __ipipe_spin_unlock_irqcomplete(flags);
+}
+
+void notrace asmlinkage
+ipipe_trace_asm(enum ipipe_trace_type type, unsigned long eip,
+ unsigned long parent_eip, unsigned long v)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(type, eip, parent_eip, v);
+}
+
+void notrace ipipe_trace_begin(unsigned long v)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(IPIPE_TRACE_BEGIN, CALLER_ADDR0,
+ CALLER_ADDR1, v);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_begin);
+
+void notrace ipipe_trace_end(unsigned long v)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(IPIPE_TRACE_END, CALLER_ADDR0,
+ CALLER_ADDR1, v);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_end);
+
+void notrace ipipe_trace_irqbegin(int irq, struct pt_regs *regs)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(IPIPE_TRACE_BEGIN, instruction_pointer(regs),
+ CALLER_ADDR1, irq);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_irqbegin);
+
+void notrace ipipe_trace_irqend(int irq, struct pt_regs *regs)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(IPIPE_TRACE_END, instruction_pointer(regs),
+ CALLER_ADDR1, irq);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_irqend);
+
+void notrace ipipe_trace_freeze(unsigned long v)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(IPIPE_TRACE_FREEZE, CALLER_ADDR0,
+ CALLER_ADDR1, v);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_freeze);
+
+void notrace ipipe_trace_special(unsigned char id, unsigned long v)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(IPIPE_TRACE_SPECIAL | (id << IPIPE_TYPE_BITS),
+ CALLER_ADDR0,
+ CALLER_ADDR1, v);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_special);
+
+void notrace ipipe_trace_pid(pid_t pid, short prio)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(IPIPE_TRACE_PID | (prio << IPIPE_TYPE_BITS),
+ CALLER_ADDR0,
+ CALLER_ADDR1, pid);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_pid);
+
+void notrace ipipe_trace_event(unsigned char id, unsigned long delay_tsc)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(IPIPE_TRACE_EVENT | (id << IPIPE_TYPE_BITS),
+ CALLER_ADDR0,
+ CALLER_ADDR1, delay_tsc);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_event);
+
+int ipipe_trace_max_reset(void)
+{
+ int cpu;
+ unsigned long flags;
+ struct ipipe_trace_path *path;
+ int ret = 0;
+
+ flags = __ipipe_global_path_lock();
+
+ for_each_possible_cpu(cpu) {
+ path = &per_cpu(trace_path, cpu)[per_cpu(max_path, cpu)];
+
+ if (path->dump_lock) {
+ ret = -EBUSY;
+ break;
+ }
+
+ path->begin = -1;
+ path->end = -1;
+ path->trace_pos = 0;
+ path->length = 0;
+ }
+
+ __ipipe_global_path_unlock(flags);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_max_reset);
+
+int ipipe_trace_frozen_reset(void)
+{
+ int cpu;
+ unsigned long flags;
+ struct ipipe_trace_path *path;
+ int ret = 0;
+
+ flags = __ipipe_global_path_lock();
+
+ for_each_online_cpu(cpu) {
+ path = &per_cpu(trace_path, cpu)[per_cpu(frozen_path, cpu)];
+
+ if (path->dump_lock) {
+ ret = -EBUSY;
+ break;
+ }
+
+ path->begin = -1;
+ path->end = -1;
+ path->trace_pos = 0;
+ path->length = 0;
+ }
+
+ __ipipe_global_path_unlock(flags);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_frozen_reset);
+
+static void
+__ipipe_get_task_info(char *task_info, struct ipipe_trace_point *point,
+ int trylock)
+{
+ struct task_struct *task = NULL;
+ char buf[8];
+ int i;
+ int locked = 1;
+
+ if (trylock) {
+ if (!read_trylock(&tasklist_lock))
+ locked = 0;
+ } else
+ read_lock(&tasklist_lock);
+
+ if (locked)
+ task = find_task_by_pid_ns((pid_t)point->v, &init_pid_ns);
+
+ if (task)
+ strncpy(task_info, task->comm, 11);
+ else
+ strcpy(task_info, "-<?>-");
+
+ if (locked)
+ read_unlock(&tasklist_lock);
+
+ for (i = strlen(task_info); i < 11; i++)
+ task_info[i] = ' ';
+
+ sprintf(buf, " %d ", point->type >> IPIPE_TYPE_BITS);
+ strcpy(task_info + (11 - strlen(buf)), buf);
+}
+
+static void
+__ipipe_get_event_date(char *buf,struct ipipe_trace_path *path,
+ struct ipipe_trace_point *point)
+{
+ long time;
+ int type;
+
+ time = __ipipe_signed_tsc2us(point->timestamp -
+ path->point[path->begin].timestamp + point->v);
+ type = point->type >> IPIPE_TYPE_BITS;
+
+ if (type == 0)
+ /*
+ * Event type #0 is predefined, stands for the next
+ * timer tick.
+ */
+ sprintf(buf, "tick@%-6ld", time);
+ else
+ sprintf(buf, "%3d@%-7ld", type, time);
+}
+
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+
+void ipipe_trace_panic_freeze(void)
+{
+ unsigned long flags;
+ int cpu;
+
+ if (!ipipe_trace_enable)
+ return;
+
+ ipipe_trace_enable = 0;
+ flags = hard_local_irq_save_notrace();
+
+ cpu = ipipe_processor_id();
+
+ panic_path = &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)];
+
+ hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_panic_freeze);
+
+void ipipe_trace_panic_dump(void)
+{
+ int cnt = back_trace;
+ int start, pos;
+ char buf[16];
+
+ if (!panic_path)
+ return;
+
+ ipipe_context_check_off();
+
+ printk(KERN_EMERG "I-pipe tracer log (%d points):\n", cnt);
+
+ start = pos = WRAP_POINT_NO(panic_path->trace_pos-1);
+
+ while (cnt-- > 0) {
+ struct ipipe_trace_point *point = &panic_path->point[pos];
+ long time;
+ char info[16];
+ int i;
+
+ printk(KERN_EMERG " %c",
+ (point->flags & IPIPE_TFLG_HWIRQ_OFF) ? '|' : ' ');
+
+ for (i = IPIPE_TFLG_DOMSTATE_BITS; i >= 0; i--)
+ printk(KERN_CONT "%c",
+ (IPIPE_TFLG_CURRENT_DOMAIN(point) == i) ?
+ (IPIPE_TFLG_DOMAIN_STALLED(point, i) ?
+ '#' : '+') :
+ (IPIPE_TFLG_DOMAIN_STALLED(point, i) ?
+ '*' : ' '));
+
+ if (!point->eip)
+ printk(KERN_CONT "-<invalid>-\n");
+ else {
+ __ipipe_trace_point_type(buf, point);
+ printk(KERN_CONT "%s", buf);
+
+ switch (point->type & IPIPE_TYPE_MASK) {
+ case IPIPE_TRACE_FUNC:
+ printk(KERN_CONT " ");
+ break;
+
+ case IPIPE_TRACE_PID:
+ __ipipe_get_task_info(info,
+ point, 1);
+ printk(KERN_CONT "%s", info);
+ break;
+
+ case IPIPE_TRACE_EVENT:
+ __ipipe_get_event_date(info,
+ panic_path,
+ point);
+ printk(KERN_CONT "%s", info);
+ break;
+
+ default:
+ printk(KERN_CONT "0x%08lx ", point->v);
+ }
+
+ time = __ipipe_signed_tsc2us(point->timestamp -
+ panic_path->point[start].timestamp);
+ printk(KERN_CONT " %5ld ", time);
+
+ __ipipe_print_symname(NULL, point->eip);
+ printk(KERN_CONT " (");
+ __ipipe_print_symname(NULL, point->parent_eip);
+ printk(KERN_CONT ")\n");
+ }
+ pos = WRAP_POINT_NO(pos - 1);
+ }
+
+ panic_path = NULL;
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_panic_dump);
+
+static int ipipe_trace_panic_handler(struct notifier_block *this,
+ unsigned long event, void *unused)
+{
+ ipipe_trace_panic_dump();
+ return NOTIFY_OK;
+}
+
+static struct notifier_block ipipe_trace_panic_notifier = {
+ .notifier_call = ipipe_trace_panic_handler,
+ .priority = 150,
+};
+
+static int ipipe_trace_die_handler(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ switch (val) {
+ case DIE_OOPS:
+ ipipe_trace_panic_dump();
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block ipipe_trace_die_notifier = {
+ .notifier_call = ipipe_trace_die_handler,
+ .priority = 200,
+};
+
+#endif /* CONFIG_IPIPE_TRACE_PANIC */
+
+
+/* --- /proc output --- */
+
+static notrace int __ipipe_in_critical_trpath(long point_no)
+{
+ return ((WRAP_POINT_NO(point_no-print_path->begin) <
+ WRAP_POINT_NO(print_path->end-print_path->begin)) ||
+ ((print_path->end == print_path->begin) &&
+ (WRAP_POINT_NO(point_no-print_path->end) >
+ print_post_trace)));
+}
+
+static long __ipipe_signed_tsc2us(long long tsc)
+{
+ unsigned long long abs_tsc;
+ long us;
+
+ if (!__ipipe_hrclock_ok())
+ return 0;
+
+ /* ipipe_tsc2us works on unsigned => handle sign separately */
+ abs_tsc = (tsc >= 0) ? tsc : -tsc;
+ us = ipipe_tsc2us(abs_tsc);
+ if (tsc < 0)
+ return -us;
+ else
+ return us;
+}
+
+static void
+__ipipe_trace_point_type(char *buf, struct ipipe_trace_point *point)
+{
+ switch (point->type & IPIPE_TYPE_MASK) {
+ case IPIPE_TRACE_FUNC:
+ strcpy(buf, "func ");
+ break;
+
+ case IPIPE_TRACE_BEGIN:
+ strcpy(buf, "begin ");
+ break;
+
+ case IPIPE_TRACE_END:
+ strcpy(buf, "end ");
+ break;
+
+ case IPIPE_TRACE_FREEZE:
+ strcpy(buf, "freeze ");
+ break;
+
+ case IPIPE_TRACE_SPECIAL:
+ sprintf(buf, "(0x%02x) ",
+ point->type >> IPIPE_TYPE_BITS);
+ break;
+
+ case IPIPE_TRACE_PID:
+ sprintf(buf, "[%5d] ", (pid_t)point->v);
+ break;
+
+ case IPIPE_TRACE_EVENT:
+ sprintf(buf, "event ");
+ break;
+ }
+}
+
+static void
+__ipipe_print_pathmark(struct seq_file *m, struct ipipe_trace_point *point)
+{
+ char mark = ' ';
+ int point_no = point - print_path->point;
+ int i;
+
+ if (print_path->end == point_no)
+ mark = '<';
+ else if (print_path->begin == point_no)
+ mark = '>';
+ else if (__ipipe_in_critical_trpath(point_no))
+ mark = ':';
+ seq_printf(m, "%c%c", mark,
+ (point->flags & IPIPE_TFLG_HWIRQ_OFF) ? '|' : ' ');
+
+ if (!verbose_trace)
+ return;
+
+ for (i = IPIPE_TFLG_DOMSTATE_BITS; i >= 0; i--)
+ seq_printf(m, "%c",
+ (IPIPE_TFLG_CURRENT_DOMAIN(point) == i) ?
+ (IPIPE_TFLG_DOMAIN_STALLED(point, i) ?
+ '#' : '+') :
+ (IPIPE_TFLG_DOMAIN_STALLED(point, i) ? '*' : ' '));
+}
+
+static void
+__ipipe_print_delay(struct seq_file *m, struct ipipe_trace_point *point)
+{
+ unsigned long delay = 0;
+ int next;
+ char *mark = " ";
+
+ next = WRAP_POINT_NO(point+1 - print_path->point);
+
+ if (next != print_path->trace_pos)
+ delay = ipipe_tsc2ns(print_path->point[next].timestamp -
+ point->timestamp);
+
+ if (__ipipe_in_critical_trpath(point - print_path->point)) {
+ if (delay > IPIPE_DELAY_WARN)
+ mark = "! ";
+ else if (delay > IPIPE_DELAY_NOTE)
+ mark = "+ ";
+ }
+ seq_puts(m, mark);
+
+ if (verbose_trace)
+ seq_printf(m, "%3lu.%03lu%c ", delay/1000, delay%1000,
+ (point->flags & IPIPE_TFLG_NMI_HIT) ? 'N' : ' ');
+ else
+ seq_puts(m, " ");
+}
+
+static void __ipipe_print_symname(struct seq_file *m, unsigned long eip)
+{
+ char namebuf[KSYM_NAME_LEN+1];
+ unsigned long size, offset;
+ const char *sym_name;
+ char *modname;
+
+ sym_name = kallsyms_lookup(eip, &size, &offset, &modname, namebuf);
+
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+ if (!m) {
+ /* panic dump */
+ if (sym_name) {
+ printk(KERN_CONT "%s+0x%lx", sym_name, offset);
+ if (modname)
+ printk(KERN_CONT " [%s]", modname);
+ } else
+ printk(KERN_CONT "<%08lx>", eip);
+ } else
+#endif /* CONFIG_IPIPE_TRACE_PANIC */
+ {
+ if (sym_name) {
+ if (verbose_trace) {
+ seq_printf(m, "%s+0x%lx", sym_name, offset);
+ if (modname)
+ seq_printf(m, " [%s]", modname);
+ } else
+ seq_puts(m, sym_name);
+ } else
+ seq_printf(m, "<%08lx>", eip);
+ }
+}
+
+static void __ipipe_print_headline(struct seq_file *m)
+{
+ const char *name[2];
+
+ seq_printf(m, "Calibrated minimum trace-point overhead: %lu.%03lu "
+ "us\n\n", trace_overhead/1000, trace_overhead%1000);
+
+ if (verbose_trace) {
+ name[0] = ipipe_root_domain->name;
+ if (ipipe_head_domain != ipipe_root_domain)
+ name[1] = ipipe_head_domain->name;
+ else
+ name[1] = "<unused>";
+
+ seq_printf(m,
+ " +----- Hard IRQs ('|': locked)\n"
+ " |+-- %s\n"
+ " ||+- %s%s\n"
+ " ||| +---------- "
+ "Delay flag ('+': > %d us, '!': > %d us)\n"
+ " ||| | +- "
+ "NMI noise ('N')\n"
+ " ||| | |\n"
+ " Type User Val. Time Delay Function "
+ "(Parent)\n",
+ name[1], name[0],
+ " ('*': domain stalled, '+': current, "
+ "'#': current+stalled)",
+ IPIPE_DELAY_NOTE/1000, IPIPE_DELAY_WARN/1000);
+ } else
+ seq_printf(m,
+ " +--------------- Hard IRQs ('|': locked)\n"
+ " | +- Delay flag "
+ "('+': > %d us, '!': > %d us)\n"
+ " | |\n"
+ " Type Time Function (Parent)\n",
+ IPIPE_DELAY_NOTE/1000, IPIPE_DELAY_WARN/1000);
+}
+
+static void *__ipipe_max_prtrace_start(struct seq_file *m, loff_t *pos)
+{
+ loff_t n = *pos;
+
+ mutex_lock(&out_mutex);
+
+ if (!n) {
+ struct ipipe_trace_path *tp;
+ unsigned long length_usecs;
+ int points, cpu;
+ unsigned long flags;
+
+ /* protect against max_path/frozen_path updates while we
+ * haven't locked our target path, also avoid recursively
+ * taking global_path_lock from NMI context */
+ flags = __ipipe_global_path_lock();
+
+ /* find the longest of all per-cpu paths */
+ print_path = NULL;
+ for_each_online_cpu(cpu) {
+ tp = &per_cpu(trace_path, cpu)[per_cpu(max_path, cpu)];
+ if ((print_path == NULL) ||
+ (tp->length > print_path->length)) {
+ print_path = tp;
+ break;
+ }
+ }
+ print_path->dump_lock = 1;
+
+ __ipipe_global_path_unlock(flags);
+
+ if (!__ipipe_hrclock_ok()) {
+ seq_printf(m, "No hrclock available, dumping traces disabled\n");
+ return NULL;
+ }
+
+ /* does this path actually contain data? */
+ if (print_path->end == print_path->begin)
+ return NULL;
+
+ /* number of points inside the critical path */
+ points = WRAP_POINT_NO(print_path->end-print_path->begin+1);
+
+ /* pre- and post-tracing length, post-trace length was frozen
+ in __ipipe_trace, pre-trace may have to be reduced due to
+ buffer overrun */
+ print_pre_trace = pre_trace;
+ print_post_trace = WRAP_POINT_NO(print_path->trace_pos -
+ print_path->end - 1);
+ if (points+pre_trace+print_post_trace > IPIPE_TRACE_POINTS - 1)
+ print_pre_trace = IPIPE_TRACE_POINTS - 1 - points -
+ print_post_trace;
+
+ length_usecs = ipipe_tsc2us(print_path->length);
+ seq_printf(m, "I-pipe worst-case tracing service on %s/ipipe release #%d\n"
+ "-------------------------------------------------------------\n",
+ UTS_RELEASE, IPIPE_CORE_RELEASE);
+ seq_printf(m, "CPU: %d, Begin: %lld cycles, Trace Points: "
+ "%d (-%d/+%d), Length: %lu us\n",
+ cpu, print_path->point[print_path->begin].timestamp,
+ points, print_pre_trace, print_post_trace, length_usecs);
+ __ipipe_print_headline(m);
+ }
+
+ /* check if we are inside the trace range */
+ if (n >= WRAP_POINT_NO(print_path->end - print_path->begin + 1 +
+ print_pre_trace + print_post_trace))
+ return NULL;
+
+ /* return the next point to be shown */
+ return &print_path->point[WRAP_POINT_NO(print_path->begin -
+ print_pre_trace + n)];
+}
+
+static void *__ipipe_prtrace_next(struct seq_file *m, void *p, loff_t *pos)
+{
+ loff_t n = ++*pos;
+
+ /* check if we are inside the trace range with the next entry */
+ if (n >= WRAP_POINT_NO(print_path->end - print_path->begin + 1 +
+ print_pre_trace + print_post_trace))
+ return NULL;
+
+ /* return the next point to be shown */
+ return &print_path->point[WRAP_POINT_NO(print_path->begin -
+ print_pre_trace + *pos)];
+}
+
+static void __ipipe_prtrace_stop(struct seq_file *m, void *p)
+{
+ if (print_path)
+ print_path->dump_lock = 0;
+ mutex_unlock(&out_mutex);
+}
+
+static int __ipipe_prtrace_show(struct seq_file *m, void *p)
+{
+ long time;
+ struct ipipe_trace_point *point = p;
+ char buf[16];
+
+ if (!point->eip) {
+ seq_puts(m, "-<invalid>-\n");
+ return 0;
+ }
+
+ __ipipe_print_pathmark(m, point);
+ __ipipe_trace_point_type(buf, point);
+ seq_puts(m, buf);
+ if (verbose_trace)
+ switch (point->type & IPIPE_TYPE_MASK) {
+ case IPIPE_TRACE_FUNC:
+ seq_puts(m, " ");
+ break;
+
+ case IPIPE_TRACE_PID:
+ __ipipe_get_task_info(buf, point, 0);
+ seq_puts(m, buf);
+ break;
+
+ case IPIPE_TRACE_EVENT:
+ __ipipe_get_event_date(buf, print_path, point);
+ seq_puts(m, buf);
+ break;
+
+ default:
+ seq_printf(m, "0x%08lx ", point->v);
+ }
+
+ time = __ipipe_signed_tsc2us(point->timestamp -
+ print_path->point[print_path->begin].timestamp);
+ seq_printf(m, "%5ld", time);
+
+ __ipipe_print_delay(m, point);
+ __ipipe_print_symname(m, point->eip);
+ seq_puts(m, " (");
+ __ipipe_print_symname(m, point->parent_eip);
+ seq_puts(m, ")\n");
+
+ return 0;
+}
+
+static struct seq_operations __ipipe_max_ptrace_ops = {
+ .start = __ipipe_max_prtrace_start,
+ .next = __ipipe_prtrace_next,
+ .stop = __ipipe_prtrace_stop,
+ .show = __ipipe_prtrace_show
+};
+
+static int __ipipe_max_prtrace_open(struct inode *inode, struct file *file)
+{
+ return seq_open(file, &__ipipe_max_ptrace_ops);
+}
+
+static ssize_t
+__ipipe_max_reset(struct file *file, const char __user *pbuffer,
+ size_t count, loff_t *data)
+{
+ mutex_lock(&out_mutex);
+ ipipe_trace_max_reset();
+ mutex_unlock(&out_mutex);
+
+ return count;
+}
+
+static const struct file_operations __ipipe_max_prtrace_fops = {
+ .open = __ipipe_max_prtrace_open,
+ .read = seq_read,
+ .write = __ipipe_max_reset,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static void *__ipipe_frozen_prtrace_start(struct seq_file *m, loff_t *pos)
+{
+ loff_t n = *pos;
+
+ mutex_lock(&out_mutex);
+
+ if (!n) {
+ struct ipipe_trace_path *tp;
+ int cpu;
+ unsigned long flags;
+
+ /* protect against max_path/frozen_path updates while we
+ * haven't locked our target path, also avoid recursively
+ * taking global_path_lock from NMI context */
+ flags = __ipipe_global_path_lock();
+
+ /* find the first of all per-cpu frozen paths */
+ print_path = NULL;
+ for_each_online_cpu(cpu) {
+ tp = &per_cpu(trace_path, cpu)[per_cpu(frozen_path, cpu)];
+ if (tp->end >= 0) {
+ print_path = tp;
+ break;
+ }
+ }
+ if (print_path)
+ print_path->dump_lock = 1;
+
+ __ipipe_global_path_unlock(flags);
+
+ if (!print_path)
+ return NULL;
+
+ if (!__ipipe_hrclock_ok()) {
+ seq_printf(m, "No hrclock available, dumping traces disabled\n");
+ return NULL;
+ }
+
+ /* back- and post-tracing length, post-trace length was frozen
+ in __ipipe_trace, back-trace may have to be reduced due to
+ buffer overrun */
+ print_pre_trace = back_trace-1; /* substract freeze point */
+ print_post_trace = WRAP_POINT_NO(print_path->trace_pos -
+ print_path->end - 1);
+ if (1+pre_trace+print_post_trace > IPIPE_TRACE_POINTS - 1)
+ print_pre_trace = IPIPE_TRACE_POINTS - 2 -
+ print_post_trace;
+
+ seq_printf(m, "I-pipe frozen back-tracing service on %s/ipipe release #%d\n"
+ "------------------------------------------------------------\n",
+ UTS_RELEASE, IPIPE_CORE_RELEASE);
+ seq_printf(m, "CPU: %d, Freeze: %lld cycles, Trace Points: %d (+%d)\n",
+ cpu, print_path->point[print_path->begin].timestamp,
+ print_pre_trace+1, print_post_trace);
+ __ipipe_print_headline(m);
+ }
+
+ /* check if we are inside the trace range */
+ if (n >= print_pre_trace + 1 + print_post_trace)
+ return NULL;
+
+ /* return the next point to be shown */
+ return &print_path->point[WRAP_POINT_NO(print_path->begin-
+ print_pre_trace+n)];
+}
+
+static struct seq_operations __ipipe_frozen_ptrace_ops = {
+ .start = __ipipe_frozen_prtrace_start,
+ .next = __ipipe_prtrace_next,
+ .stop = __ipipe_prtrace_stop,
+ .show = __ipipe_prtrace_show
+};
+
+static int __ipipe_frozen_prtrace_open(struct inode *inode, struct file *file)
+{
+ return seq_open(file, &__ipipe_frozen_ptrace_ops);
+}
+
+static ssize_t
+__ipipe_frozen_ctrl(struct file *file, const char __user *pbuffer,
+ size_t count, loff_t *data)
+{
+ char *end, buf[16];
+ int val;
+ int n;
+
+ n = (count > sizeof(buf) - 1) ? sizeof(buf) - 1 : count;
+
+ if (copy_from_user(buf, pbuffer, n))
+ return -EFAULT;
+
+ buf[n] = '\0';
+ val = simple_strtol(buf, &end, 0);
+
+ if (((*end != '\0') && !isspace(*end)) || (val < 0))
+ return -EINVAL;
+
+ mutex_lock(&out_mutex);
+ ipipe_trace_frozen_reset();
+ if (val > 0)
+ ipipe_trace_freeze(-1);
+ mutex_unlock(&out_mutex);
+
+ return count;
+}
+
+static const struct file_operations __ipipe_frozen_prtrace_fops = {
+ .open = __ipipe_frozen_prtrace_open,
+ .read = seq_read,
+ .write = __ipipe_frozen_ctrl,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static int __ipipe_rd_proc_val(struct seq_file *p, void *data)
+{
+ seq_printf(p, "%u\n", *(int *)p->private);
+ return 0;
+}
+
+static ssize_t
+__ipipe_wr_proc_val(struct file *file, const char __user *buffer,
+ size_t count, loff_t *data)
+{
+ struct seq_file *p = file->private_data;
+ char *end, buf[16];
+ int val;
+ int n;
+
+ n = (count > sizeof(buf) - 1) ? sizeof(buf) - 1 : count;
+
+ if (copy_from_user(buf, buffer, n))
+ return -EFAULT;
+
+ buf[n] = '\0';
+ val = simple_strtol(buf, &end, 0);
+
+ if (((*end != '\0') && !isspace(*end)) || (val < 0))
+ return -EINVAL;
+
+ mutex_lock(&out_mutex);
+ *(int *)p->private = val;
+ mutex_unlock(&out_mutex);
+
+ return count;
+}
+
+static int __ipipe_rw_proc_val_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, __ipipe_rd_proc_val, PDE_DATA(inode));
+}
+
+static const struct file_operations __ipipe_rw_proc_val_ops = {
+ .open = __ipipe_rw_proc_val_open,
+ .read = seq_read,
+ .write = __ipipe_wr_proc_val,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static void __init
+__ipipe_create_trace_proc_val(struct proc_dir_entry *trace_dir,
+ const char *name, int *value_ptr)
+{
+ proc_create_data(name, 0644, trace_dir, &__ipipe_rw_proc_val_ops,
+ value_ptr);
+}
+
+static int __ipipe_rd_trigger(struct seq_file *p, void *data)
+{
+ char str[KSYM_SYMBOL_LEN];
+
+ if (trigger_begin) {
+ sprint_symbol(str, trigger_begin);
+ seq_printf(p, "%s\n", str);
+ }
+ return 0;
+}
+
+static ssize_t
+__ipipe_wr_trigger(struct file *file, const char __user *buffer,
+ size_t count, loff_t *data)
+{
+ char buf[KSYM_SYMBOL_LEN];
+ unsigned long begin, end;
+
+ if (count > sizeof(buf) - 1)
+ count = sizeof(buf) - 1;
+ if (copy_from_user(buf, buffer, count))
+ return -EFAULT;
+ buf[count] = 0;
+ if (buf[count-1] == '\n')
+ buf[count-1] = 0;
+
+ begin = kallsyms_lookup_name(buf);
+ if (!begin || !kallsyms_lookup_size_offset(begin, &end, NULL))
+ return -ENOENT;
+ end += begin - 1;
+
+ mutex_lock(&out_mutex);
+ /* invalidate the current range before setting a new one */
+ trigger_end = 0;
+ wmb();
+ ipipe_trace_frozen_reset();
+
+ /* set new range */
+ trigger_begin = begin;
+ wmb();
+ trigger_end = end;
+ mutex_unlock(&out_mutex);
+
+ return count;
+}
+
+static int __ipipe_rw_trigger_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, __ipipe_rd_trigger, NULL);
+}
+
+static const struct file_operations __ipipe_rw_trigger_ops = {
+ .open = __ipipe_rw_trigger_open,
+ .read = seq_read,
+ .write = __ipipe_wr_trigger,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+
+#ifdef CONFIG_IPIPE_TRACE_MCOUNT
+static void notrace
+ipipe_trace_function(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *op, struct pt_regs *regs)
+{
+ if (!ipipe_trace_enable)
+ return;
+ __ipipe_trace(IPIPE_TRACE_FUNC, ip, parent_ip, 0);
+}
+
+static struct ftrace_ops ipipe_trace_ops = {
+ .func = ipipe_trace_function,
+ .flags = FTRACE_OPS_FL_IPIPE_EXCLUSIVE,
+};
+
+static ssize_t __ipipe_wr_enable(struct file *file, const char __user *buffer,
+ size_t count, loff_t *data)
+{
+ char *end, buf[16];
+ int val;
+ int n;
+
+ n = (count > sizeof(buf) - 1) ? sizeof(buf) - 1 : count;
+
+ if (copy_from_user(buf, buffer, n))
+ return -EFAULT;
+
+ buf[n] = '\0';
+ val = simple_strtol(buf, &end, 0);
+
+ if (((*end != '\0') && !isspace(*end)) || (val < 0))
+ return -EINVAL;
+
+ mutex_lock(&out_mutex);
+
+ if (ipipe_trace_enable) {
+ if (!val)
+ unregister_ftrace_function(&ipipe_trace_ops);
+ } else if (val)
+ register_ftrace_function(&ipipe_trace_ops);
+
+ ipipe_trace_enable = val;
+
+ mutex_unlock(&out_mutex);
+
+ return count;
+}
+
+static const struct file_operations __ipipe_rw_enable_ops = {
+ .open = __ipipe_rw_proc_val_open,
+ .read = seq_read,
+ .write = __ipipe_wr_enable,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+#endif /* CONFIG_IPIPE_TRACE_MCOUNT */
+
+extern struct proc_dir_entry *ipipe_proc_root;
+
+void __init __ipipe_tracer_hrclock_initialized(void)
+{
+ unsigned long long start, end, min = ULLONG_MAX;
+ int i;
+
+#ifdef CONFIG_IPIPE_TRACE_VMALLOC
+ if (!per_cpu(trace_path, 0))
+ return;
+#endif
+ /* Calculate minimum overhead of __ipipe_trace() */
+ hard_local_irq_disable();
+ for (i = 0; i < 100; i++) {
+ ipipe_read_tsc(start);
+ __ipipe_trace(IPIPE_TRACE_FUNC, CALLER_ADDR0,
+ CALLER_ADDR1, 0);
+ ipipe_read_tsc(end);
+
+ end -= start;
+ if (end < min)
+ min = end;
+ }
+ hard_local_irq_enable();
+ trace_overhead = ipipe_tsc2ns(min);
+}
+
+void __init __ipipe_init_tracer(void)
+{
+ struct proc_dir_entry *trace_dir;
+#ifdef CONFIG_IPIPE_TRACE_VMALLOC
+ int cpu, path;
+#endif /* CONFIG_IPIPE_TRACE_VMALLOC */
+
+#ifdef CONFIG_IPIPE_TRACE_VMALLOC
+ for_each_possible_cpu(cpu) {
+ struct ipipe_trace_path *tp_buf;
+
+ tp_buf = vmalloc_node(sizeof(struct ipipe_trace_path) *
+ IPIPE_TRACE_PATHS, cpu_to_node(cpu));
+ if (!tp_buf) {
+ pr_err("I-pipe: "
+ "insufficient memory for trace buffer.\n");
+ return;
+ }
+ memset(tp_buf, 0,
+ sizeof(struct ipipe_trace_path) * IPIPE_TRACE_PATHS);
+ for (path = 0; path < IPIPE_TRACE_PATHS; path++) {
+ tp_buf[path].begin = -1;
+ tp_buf[path].end = -1;
+ }
+ per_cpu(trace_path, cpu) = tp_buf;
+ }
+#endif /* CONFIG_IPIPE_TRACE_VMALLOC */
+
+ if (__ipipe_hrclock_ok() && !trace_overhead)
+ __ipipe_tracer_hrclock_initialized();
+
+#ifdef CONFIG_IPIPE_TRACE_ENABLE
+ ipipe_trace_enable = 1;
+#ifdef CONFIG_IPIPE_TRACE_MCOUNT
+ ftrace_enabled = 1;
+ register_ftrace_function(&ipipe_trace_ops);
+#endif /* CONFIG_IPIPE_TRACE_MCOUNT */
+#endif /* CONFIG_IPIPE_TRACE_ENABLE */
+
+ trace_dir = proc_mkdir("trace", ipipe_proc_root);
+
+ proc_create("max", 0644, trace_dir, &__ipipe_max_prtrace_fops);
+ proc_create("frozen", 0644, trace_dir, &__ipipe_frozen_prtrace_fops);
+
+ proc_create("trigger", 0644, trace_dir, &__ipipe_rw_trigger_ops);
+
+ __ipipe_create_trace_proc_val(trace_dir, "pre_trace_points",
+ &pre_trace);
+ __ipipe_create_trace_proc_val(trace_dir, "post_trace_points",
+ &post_trace);
+ __ipipe_create_trace_proc_val(trace_dir, "back_trace_points",
+ &back_trace);
+ __ipipe_create_trace_proc_val(trace_dir, "verbose",
+ &verbose_trace);
+#ifdef CONFIG_IPIPE_TRACE_MCOUNT
+ proc_create_data("enable", 0644, trace_dir, &__ipipe_rw_enable_ops,
+ &ipipe_trace_enable);
+#else /* !CONFIG_IPIPE_TRACE_MCOUNT */
+ __ipipe_create_trace_proc_val(trace_dir, "enable",
+ &ipipe_trace_enable);
+#endif /* !CONFIG_IPIPE_TRACE_MCOUNT */
+
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+ atomic_notifier_chain_register(&panic_notifier_list,
+ &ipipe_trace_panic_notifier);
+ register_die_notifier(&ipipe_trace_die_notifier);
+#endif /* CONFIG_IPIPE_TRACE_PANIC */
+}
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 521121c2666c..cf7817a1e28e 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -14,6 +14,7 @@
#include <linux/interrupt.h>
#include <linux/kernel_stat.h>
#include <linux/irqdomain.h>
+#include <linux/ipipe.h>
#include <trace/events/irq.h>
@@ -48,6 +49,10 @@ int irq_set_chip(unsigned int irq, struct irq_chip *chip)
if (!chip)
chip = &no_irq_chip;
+ else
+ WARN_ONCE(IS_ENABLED(CONFIG_IPIPE) &&
+ (chip->flags & IRQCHIP_PIPELINE_SAFE) == 0,
+ "irqchip %s is not pipeline-safe!", chip->name);
desc->irq_data.chip = chip;
irq_put_desc_unlock(desc, flags);
@@ -155,14 +160,6 @@ int irq_set_chip_data(unsigned int irq, void *data)
}
EXPORT_SYMBOL(irq_set_chip_data);
-struct irq_data *irq_get_irq_data(unsigned int irq)
-{
- struct irq_desc *desc = irq_to_desc(irq);
-
- return desc ? &desc->irq_data : NULL;
-}
-EXPORT_SYMBOL_GPL(irq_get_irq_data);
-
static void irq_state_clr_disabled(struct irq_desc *desc)
{
irqd_clear(&desc->irq_data, IRQD_IRQ_DISABLED);
@@ -242,9 +239,14 @@ static int __irq_startup(struct irq_desc *desc)
WARN_ON_ONCE(!irqd_is_activated(d));
if (d->chip->irq_startup) {
+ unsigned long flags = hard_cond_local_irq_save();
ret = d->chip->irq_startup(d);
irq_state_clr_disabled(desc);
irq_state_clr_masked(desc);
+ hard_cond_local_irq_restore(flags);
+#ifdef CONFIG_IPIPE
+ desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
+#endif
} else {
irq_enable(desc);
}
@@ -312,6 +314,9 @@ void irq_shutdown(struct irq_desc *desc)
desc->irq_data.chip->irq_shutdown(&desc->irq_data);
irq_state_set_disabled(desc);
irq_state_set_masked(desc);
+#ifdef CONFIG_IPIPE
+ desc->istate |= IPIPE_IRQS_NEEDS_STARTUP;
+#endif
} else {
__irq_disable(desc, true);
}
@@ -334,6 +339,8 @@ void irq_shutdown_and_deactivate(struct irq_desc *desc)
void irq_enable(struct irq_desc *desc)
{
+ unsigned long flags = hard_cond_local_irq_save();
+
if (!irqd_irq_disabled(&desc->irq_data)) {
unmask_irq(desc);
} else {
@@ -345,10 +352,14 @@ void irq_enable(struct irq_desc *desc)
unmask_irq(desc);
}
}
+
+ hard_cond_local_irq_restore(flags);
}
static void __irq_disable(struct irq_desc *desc, bool mask)
{
+ unsigned long flags = hard_cond_local_irq_save();
+
if (irqd_irq_disabled(&desc->irq_data)) {
if (mask)
mask_irq(desc);
@@ -361,6 +372,8 @@ static void __irq_disable(struct irq_desc *desc, bool mask)
mask_irq(desc);
}
}
+
+ hard_cond_local_irq_restore(flags);
}
/**
@@ -390,11 +403,13 @@ void irq_disable(struct irq_desc *desc)
void irq_percpu_enable(struct irq_desc *desc, unsigned int cpu)
{
+ unsigned long flags = hard_cond_local_irq_save();
if (desc->irq_data.chip->irq_enable)
desc->irq_data.chip->irq_enable(&desc->irq_data);
else
desc->irq_data.chip->irq_unmask(&desc->irq_data);
cpumask_set_cpu(cpu, desc->percpu_enabled);
+ hard_cond_local_irq_restore(flags);
}
void irq_percpu_disable(struct irq_desc *desc, unsigned int cpu)
@@ -431,12 +446,16 @@ void mask_irq(struct irq_desc *desc)
void unmask_irq(struct irq_desc *desc)
{
+ unsigned long flags;
+
if (!irqd_irq_masked(&desc->irq_data))
return;
if (desc->irq_data.chip->irq_unmask) {
+ flags = hard_cond_local_irq_save();
desc->irq_data.chip->irq_unmask(&desc->irq_data);
irq_state_clr_masked(desc);
+ hard_cond_local_irq_restore(flags);
}
}
@@ -633,7 +652,9 @@ static void cond_unmask_irq(struct irq_desc *desc)
void handle_level_irq(struct irq_desc *desc)
{
raw_spin_lock(&desc->lock);
+#ifndef CONFIG_IPIPE
mask_ack_irq(desc);
+#endif
if (!irq_may_run(desc))
goto out_unlock;
@@ -669,7 +690,16 @@ static inline void preflow_handler(struct irq_desc *desc)
static inline void preflow_handler(struct irq_desc *desc) { }
#endif
-static void cond_unmask_eoi_irq(struct irq_desc *desc, struct irq_chip *chip)
+#ifdef CONFIG_IPIPE
+static void cond_release_fasteoi_irq(struct irq_desc *desc,
+ struct irq_chip *chip)
+{
+ if (chip->irq_release &&
+ !irqd_irq_disabled(&desc->irq_data) && !desc->threads_oneshot)
+ chip->irq_release(&desc->irq_data);
+}
+#else
+static inline void cond_unmask_eoi_irq(struct irq_desc *desc, struct irq_chip *chip)
{
if (!(desc->istate & IRQS_ONESHOT)) {
chip->irq_eoi(&desc->irq_data);
@@ -689,6 +719,7 @@ static void cond_unmask_eoi_irq(struct irq_desc *desc, struct irq_chip *chip)
chip->irq_eoi(&desc->irq_data);
}
}
+#endif /* !CONFIG_IPIPE */
/**
* handle_fasteoi_irq - irq handler for transparent controllers
@@ -721,13 +752,23 @@ void handle_fasteoi_irq(struct irq_desc *desc)
}
kstat_incr_irqs_this_cpu(desc);
+#ifndef CONFIG_IPIPE
if (desc->istate & IRQS_ONESHOT)
mask_irq(desc);
+#endif
preflow_handler(desc);
handle_irq_event(desc);
+#ifdef CONFIG_IPIPE
+ /*
+ * IRQCHIP_EOI_IF_HANDLED is ignored as the I-pipe always
+ * sends EOI.
+ */
+ cond_release_fasteoi_irq(desc, chip);
+#else /* !CONFIG_IPIPE */
cond_unmask_eoi_irq(desc, chip);
+#endif /* !CONFIG_IPIPE */
raw_spin_unlock(&desc->lock);
return;
@@ -811,7 +852,9 @@ void handle_edge_irq(struct irq_desc *desc)
kstat_incr_irqs_this_cpu(desc);
/* Start handling the irq */
+#ifndef CONFIG_IPIPE
desc->irq_data.chip->irq_ack(&desc->irq_data);
+#endif
do {
if (unlikely(!desc->action)) {
@@ -903,6 +946,11 @@ void handle_percpu_irq(struct irq_desc *desc)
*/
__kstat_incr_irqs_this_cpu(desc);
+#ifdef CONFIG_IPIPE
+ (void)chip;
+ handle_irq_event_percpu(desc);
+ desc->ipipe_end(desc);
+#else
if (chip->irq_ack)
chip->irq_ack(&desc->irq_data);
@@ -910,6 +958,7 @@ void handle_percpu_irq(struct irq_desc *desc)
if (chip->irq_eoi)
chip->irq_eoi(&desc->irq_data);
+#endif
}
/**
@@ -936,13 +985,20 @@ void handle_percpu_devid_irq(struct irq_desc *desc)
*/
__kstat_incr_irqs_this_cpu(desc);
+#ifndef CONFIG_IPIPE
if (chip->irq_ack)
chip->irq_ack(&desc->irq_data);
+#endif
if (likely(action)) {
trace_irq_handler_entry(irq, action);
res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));
trace_irq_handler_exit(irq, action, res);
+#ifdef CONFIG_IPIPE
+ (void)chip;
+ desc->ipipe_end(desc);
+ return;
+#endif
} else {
unsigned int cpu = smp_processor_id();
bool enabled = cpumask_test_cpu(cpu, desc->percpu_enabled);
@@ -983,6 +1039,170 @@ void handle_percpu_devid_fasteoi_nmi(struct irq_desc *desc)
chip->irq_eoi(&desc->irq_data);
}
+#ifdef CONFIG_IPIPE
+
+void __ipipe_ack_level_irq(struct irq_desc *desc)
+{
+ mask_ack_irq(desc);
+}
+
+void __ipipe_end_level_irq(struct irq_desc *desc)
+{
+ unmask_irq(desc);
+}
+
+void __ipipe_ack_fasteoi_irq(struct irq_desc *desc)
+{
+ desc->irq_data.chip->irq_hold(&desc->irq_data);
+}
+
+void __ipipe_end_fasteoi_irq(struct irq_desc *desc)
+{
+ if (desc->irq_data.chip->irq_release)
+ desc->irq_data.chip->irq_release(&desc->irq_data);
+}
+
+void __ipipe_ack_edge_irq(struct irq_desc *desc)
+{
+ desc->irq_data.chip->irq_ack(&desc->irq_data);
+}
+
+void __ipipe_ack_percpu_irq(struct irq_desc *desc)
+{
+ if (desc->irq_data.chip->irq_ack)
+ desc->irq_data.chip->irq_ack(&desc->irq_data);
+
+ if (desc->irq_data.chip->irq_eoi)
+ desc->irq_data.chip->irq_eoi(&desc->irq_data);
+}
+
+void __ipipe_nop_irq(struct irq_desc *desc)
+{
+}
+
+void __ipipe_chained_irq(struct irq_desc *desc)
+{
+ /*
+ * XXX: Do NOT fold this into __ipipe_nop_irq(), see
+ * ipipe_chained_irq_p().
+ */
+}
+
+static void __ipipe_ack_bad_irq(struct irq_desc *desc)
+{
+ handle_bad_irq(desc);
+ WARN_ON_ONCE(1);
+}
+
+irq_flow_handler_t
+__ipipe_setup_irq_desc(struct irq_desc *desc, irq_flow_handler_t handle, int is_chained)
+{
+ if (unlikely(handle == NULL)) {
+ desc->ipipe_ack = __ipipe_ack_bad_irq;
+ desc->ipipe_end = __ipipe_nop_irq;
+ } else {
+ if (is_chained) {
+ desc->ipipe_ack = handle;
+ desc->ipipe_end = __ipipe_nop_irq;
+ handle = __ipipe_chained_irq;
+ } else if (handle == handle_simple_irq) {
+ desc->ipipe_ack = __ipipe_nop_irq;
+ desc->ipipe_end = __ipipe_nop_irq;
+ } else if (handle == handle_level_irq) {
+ desc->ipipe_ack = __ipipe_ack_level_irq;
+ desc->ipipe_end = __ipipe_end_level_irq;
+ } else if (handle == handle_edge_irq) {
+ desc->ipipe_ack = __ipipe_ack_edge_irq;
+ desc->ipipe_end = __ipipe_nop_irq;
+ } else if (handle == handle_fasteoi_irq) {
+ desc->ipipe_ack = __ipipe_ack_fasteoi_irq;
+ desc->ipipe_end = __ipipe_end_fasteoi_irq;
+ } else if (handle == handle_percpu_irq ||
+ handle == handle_percpu_devid_irq) {
+ if (irq_desc_get_chip(desc) &&
+ irq_desc_get_chip(desc)->irq_hold) {
+ desc->ipipe_ack = __ipipe_ack_fasteoi_irq;
+ desc->ipipe_end = __ipipe_end_fasteoi_irq;
+ } else {
+ desc->ipipe_ack = __ipipe_ack_percpu_irq;
+ desc->ipipe_end = __ipipe_nop_irq;
+ }
+ } else if (irq_desc_get_chip(desc) == &no_irq_chip) {
+ desc->ipipe_ack = __ipipe_nop_irq;
+ desc->ipipe_end = __ipipe_nop_irq;
+ } else {
+ desc->ipipe_ack = __ipipe_ack_bad_irq;
+ desc->ipipe_end = __ipipe_nop_irq;
+ }
+ }
+
+ /*
+ * We don't cope well with lazy disabling simply because we
+ * neither track nor update the descriptor state bits, which
+ * is badly wrong.
+ */
+ irq_settings_clr_and_set(desc, 0, _IRQ_DISABLE_UNLAZY);
+
+ /* Suppress intermediate trampoline routine. */
+ ipipe_root_domain->irqs[desc->irq_data.irq].ackfn = desc->ipipe_ack;
+
+ return handle;
+}
+
+int ipipe_enable_irq(unsigned int irq)
+{
+ struct irq_desc *desc;
+ struct irq_chip *chip;
+ unsigned long flags;
+ int err;
+
+ desc = irq_to_desc(irq);
+ if (desc == NULL)
+ return -EINVAL;
+
+ chip = irq_desc_get_chip(desc);
+
+ if (chip->irq_startup && (desc->istate & IPIPE_IRQS_NEEDS_STARTUP)) {
+
+ ipipe_root_only();
+
+ err = irq_activate(desc);
+ if (err)
+ return err;
+
+ raw_spin_lock_irqsave(&desc->lock, flags);
+ if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
+ desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
+ chip->irq_startup(&desc->irq_data);
+ }
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+ return 0;
+ }
+
+ if (chip->irq_enable == NULL && chip->irq_unmask == NULL)
+ return -ENOSYS;
+
+ if (chip->irq_enable)
+ chip->irq_enable(&desc->irq_data);
+ else
+ chip->irq_unmask(&desc->irq_data);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ipipe_enable_irq);
+
+#else /* !CONFIG_IPIPE */
+
+irq_flow_handler_t
+__ipipe_setup_irq_desc(struct irq_desc *desc, irq_flow_handler_t handle, int is_chained)
+{
+ return handle;
+}
+
+#endif /* !CONFIG_IPIPE */
+EXPORT_SYMBOL_GPL(__ipipe_setup_irq_desc);
+
static void
__irq_do_set_handler(struct irq_desc *desc, irq_flow_handler_t handle,
int is_chained, const char *name)
@@ -1017,6 +1237,8 @@ __irq_do_set_handler(struct irq_desc *desc, irq_flow_handler_t handle,
return;
}
+ handle = __ipipe_setup_irq_desc(desc, handle, is_chained);
+
/* Uninstall? */
if (handle == handle_bad_irq) {
if (desc->irq_data.chip != &no_irq_chip)
@@ -1352,6 +1574,20 @@ void irq_chip_mask_parent(struct irq_data *data)
}
EXPORT_SYMBOL_GPL(irq_chip_mask_parent);
+#ifdef CONFIG_IPIPE
+void irq_chip_hold_parent(struct irq_data *data)
+{
+ data = data->parent_data;
+ data->chip->irq_hold(data);
+}
+
+void irq_chip_release_parent(struct irq_data *data)
+{
+ data = data->parent_data;
+ data->chip->irq_release(data);
+}
+#endif
+
/**
* irq_chip_mask_ack_parent - Mask and acknowledge the parent interrupt
* @data: Pointer to interrupt specific data
diff --git a/kernel/irq/dummychip.c b/kernel/irq/dummychip.c
index 0b0cdf206dc4..7bf8cbee1b87 100644
--- a/kernel/irq/dummychip.c
+++ b/kernel/irq/dummychip.c
@@ -43,7 +43,7 @@ struct irq_chip no_irq_chip = {
.irq_enable = noop,
.irq_disable = noop,
.irq_ack = ack_bad,
- .flags = IRQCHIP_SKIP_SET_WAKE,
+ .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
};
/*
@@ -59,6 +59,6 @@ struct irq_chip dummy_irq_chip = {
.irq_ack = noop,
.irq_mask = noop,
.irq_unmask = noop,
- .flags = IRQCHIP_SKIP_SET_WAKE,
+ .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
};
EXPORT_SYMBOL_GPL(dummy_irq_chip);
diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index e2999a070a99..abb9d41475c7 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -37,12 +37,13 @@ void irq_gc_mask_disable_reg(struct irq_data *d)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct irq_chip_type *ct = irq_data_get_chip_type(d);
+ unsigned long flags;
u32 mask = d->mask;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
irq_reg_writel(gc, mask, ct->regs.disable);
*ct->mask_cache &= ~mask;
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
/**
@@ -56,12 +57,13 @@ void irq_gc_mask_set_bit(struct irq_data *d)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct irq_chip_type *ct = irq_data_get_chip_type(d);
+ unsigned long flags;
u32 mask = d->mask;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
*ct->mask_cache |= mask;
irq_reg_writel(gc, *ct->mask_cache, ct->regs.mask);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
EXPORT_SYMBOL_GPL(irq_gc_mask_set_bit);
@@ -76,12 +78,13 @@ void irq_gc_mask_clr_bit(struct irq_data *d)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct irq_chip_type *ct = irq_data_get_chip_type(d);
+ unsigned long flags;
u32 mask = d->mask;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
*ct->mask_cache &= ~mask;
irq_reg_writel(gc, *ct->mask_cache, ct->regs.mask);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
EXPORT_SYMBOL_GPL(irq_gc_mask_clr_bit);
@@ -96,12 +99,13 @@ void irq_gc_unmask_enable_reg(struct irq_data *d)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct irq_chip_type *ct = irq_data_get_chip_type(d);
+ unsigned long flags;
u32 mask = d->mask;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
irq_reg_writel(gc, mask, ct->regs.enable);
*ct->mask_cache |= mask;
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
/**
@@ -112,11 +116,12 @@ void irq_gc_ack_set_bit(struct irq_data *d)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct irq_chip_type *ct = irq_data_get_chip_type(d);
+ unsigned long flags;
u32 mask = d->mask;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
irq_reg_writel(gc, mask, ct->regs.ack);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
EXPORT_SYMBOL_GPL(irq_gc_ack_set_bit);
@@ -128,11 +133,12 @@ void irq_gc_ack_clr_bit(struct irq_data *d)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct irq_chip_type *ct = irq_data_get_chip_type(d);
+ unsigned long flags;
u32 mask = ~d->mask;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
irq_reg_writel(gc, mask, ct->regs.ack);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
/**
@@ -151,13 +157,14 @@ void irq_gc_mask_disable_and_ack_set(struct irq_data *d)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct irq_chip_type *ct = irq_data_get_chip_type(d);
+ unsigned long flags;
u32 mask = d->mask;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
irq_reg_writel(gc, mask, ct->regs.disable);
*ct->mask_cache &= ~mask;
irq_reg_writel(gc, mask, ct->regs.ack);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
/**
@@ -168,11 +175,12 @@ void irq_gc_eoi(struct irq_data *d)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
struct irq_chip_type *ct = irq_data_get_chip_type(d);
+ unsigned long flags;
u32 mask = d->mask;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
irq_reg_writel(gc, mask, ct->regs.eoi);
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
}
/**
@@ -187,17 +195,18 @@ void irq_gc_eoi(struct irq_data *d)
int irq_gc_set_wake(struct irq_data *d, unsigned int on)
{
struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ unsigned long flags;
u32 mask = d->mask;
if (!(mask & gc->wake_enabled))
return -EINVAL;
- irq_gc_lock(gc);
+ flags = irq_gc_lock(gc);
if (on)
gc->wake_active |= mask;
else
gc->wake_active &= ~mask;
- irq_gc_unlock(gc);
+ irq_gc_unlock(gc, flags);
return 0;
}
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 7057b60afabe..9da255ee5c81 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -66,6 +66,7 @@ enum {
IRQS_TIMINGS = 0x00001000,
IRQS_NMI = 0x00002000,
IRQS_SYSFS = 0x00004000,
+ IPIPE_IRQS_NEEDS_STARTUP= 0x80000000,
};
#include "debug.h"
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 0272a2e36ae6..24110904060e 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -125,6 +125,9 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node,
for_each_possible_cpu(cpu)
*per_cpu_ptr(desc->kstat_irqs, cpu) = 0;
desc_smp_init(desc, node, affinity);
+#ifdef CONFIG_IPIPE
+ desc->istate |= IPIPE_IRQS_NEEDS_STARTUP;
+#endif
}
int nr_irqs = NR_IRQS;
@@ -583,11 +586,13 @@ int __init early_irq_init(void)
return arch_early_irq_init();
}
+#ifndef CONFIG_IPIPE
struct irq_desc *irq_to_desc(unsigned int irq)
{
return (irq < NR_IRQS) ? irq_desc + irq : NULL;
}
EXPORT_SYMBOL(irq_to_desc);
+#endif /* CONFIG_IPIPE */
static void free_desc(unsigned int irq)
{
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 79214f983624..03d526d38f66 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -959,9 +959,14 @@ again:
desc->threads_oneshot &= ~action->thread_mask;
+#ifndef CONFIG_IPIPE
if (!desc->threads_oneshot && !irqd_irq_disabled(&desc->irq_data) &&
irqd_irq_masked(&desc->irq_data))
unmask_threaded_irq(desc);
+#else /* CONFIG_IPIPE */
+ if (!desc->threads_oneshot && !irqd_irq_disabled(&desc->irq_data))
+ desc->ipipe_end(desc);
+#endif /* CONFIG_IPIPE */
out_unlock:
raw_spin_unlock_irq(&desc->lock);
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index b7e4c5999cc8..227b2d1444f4 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -268,6 +268,9 @@ static void msi_domain_update_chip_ops(struct msi_domain_info *info)
struct irq_chip *chip = info->chip;
BUG_ON(!chip || !chip->irq_mask || !chip->irq_unmask);
+ WARN_ONCE(IS_ENABLED(CONFIG_IPIPE) &&
+ (chip->flags & IRQCHIP_PIPELINE_SAFE) == 0,
+ "MSI domain irqchip %s is not pipeline-safe!", chip->name);
if (!chip->irq_set_affinity)
chip->irq_set_affinity = msi_domain_set_affinity;
}
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index db109d38f301..d33833be3d8b 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3534,7 +3534,7 @@ void lockdep_hardirqs_on(unsigned long ip)
* already enabled, yet we find the hardware thinks they are in fact
* enabled.. someone messed up their IRQ state tracing.
*/
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled() && !hard_irqs_disabled()))
return;
/*
@@ -3561,7 +3561,9 @@ NOKPROBE_SYMBOL(lockdep_hardirqs_on);
*/
void lockdep_hardirqs_off(unsigned long ip)
{
- struct task_struct *curr = current;
+ struct task_struct *curr;
+
+ curr = current;
if (unlikely(!debug_locks || current->lockdep_recursion))
return;
@@ -3570,7 +3572,7 @@ void lockdep_hardirqs_off(unsigned long ip)
* So we're supposed to get called after you mask local IRQs, but for
* some reason the hardware doesn't quite think you did a proper job.
*/
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled() && !hard_irqs_disabled()))
return;
if (curr->hardirqs_enabled) {
@@ -3600,7 +3602,7 @@ void trace_softirqs_on(unsigned long ip)
* We fancy IRQs being disabled here, see softirq.c, avoids
* funny state and nesting things.
*/
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled() && !hard_irqs_disabled()))
return;
if (curr->softirqs_enabled) {
@@ -3639,7 +3641,7 @@ void trace_softirqs_off(unsigned long ip)
/*
* We fancy IRQs being disabled here, see softirq.c
*/
- if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled() && !hard_irqs_disabled()))
return;
if (curr->softirqs_enabled) {
diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
index a525368b8cf6..ce2755b52821 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -202,12 +202,12 @@ extern struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
this_cpu_inc(lockdep_stats.ptr);
#define debug_atomic_inc(ptr) { \
- WARN_ON_ONCE(!irqs_disabled()); \
+ WARN_ON_ONCE(!hard_irqs_disabled() && !irqs_disabled()); \
__this_cpu_inc(lockdep_stats.ptr); \
}
#define debug_atomic_dec(ptr) { \
- WARN_ON_ONCE(!irqs_disabled()); \
+ WARN_ON_ONCE(!hard_irqs_disabled() && !irqs_disabled());\
__this_cpu_dec(lockdep_stats.ptr); \
}
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 0ff08380f531..7d7a34aa3e40 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -34,7 +34,9 @@ EXPORT_PER_CPU_SYMBOL(__mmiowb_state);
* even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
* not re-enabled during lock-acquire (which the preempt-spin-ops do):
*/
-#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)
+#if !defined(CONFIG_GENERIC_LOCKBREAK) || \
+ defined(CONFIG_DEBUG_LOCK_ALLOC) || \
+ defined(CONFIG_IPIPE)
/*
* The __lock_function inlines are taken from
* spinlock : include/linux/spinlock_api_smp.h
diff --git a/kernel/module.c b/kernel/module.c
index 30ac7514bd2b..c7d3dab84a80 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1134,7 +1134,7 @@ bool try_module_get(struct module *module)
bool ret = true;
if (module) {
- preempt_disable();
+ unsigned long flags = hard_preempt_disable();
/* Note: here, we can fail to get a reference */
if (likely(module_is_live(module) &&
atomic_inc_not_zero(&module->refcnt) != 0))
@@ -1142,7 +1142,7 @@ bool try_module_get(struct module *module)
else
ret = false;
- preempt_enable();
+ hard_preempt_enable(flags);
}
return ret;
}
@@ -1153,11 +1153,11 @@ void module_put(struct module *module)
int ret;
if (module) {
- preempt_disable();
+ unsigned long flags = hard_preempt_disable();
ret = atomic_dec_if_positive(&module->refcnt);
WARN_ON(ret < 0); /* Failed to put refcount */
trace_module_put(module, _RET_IP_);
- preempt_enable();
+ hard_preempt_enable(flags);
}
}
EXPORT_SYMBOL(module_put);
diff --git a/kernel/notifier.c b/kernel/notifier.c
index f6d5ffe4e72e..e1c427bfacc8 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -6,6 +6,7 @@
#include <linux/rcupdate.h>
#include <linux/vmalloc.h>
#include <linux/reboot.h>
+#include <linux/ipipe.h>
/*
* Notifier list for kernel code which wants to be called
@@ -195,6 +196,9 @@ NOKPROBE_SYMBOL(__atomic_notifier_call_chain);
int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
unsigned long val, void *v)
{
+ if (!ipipe_root_p)
+ return notifier_call_chain(&nh->head, val, v, -1, NULL);
+
return __atomic_notifier_call_chain(nh, val, v, -1, NULL);
}
EXPORT_SYMBOL_GPL(atomic_notifier_call_chain);
diff --git a/kernel/panic.c b/kernel/panic.c
index cef79466f941..c767d2e8b2dc 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -22,8 +22,10 @@
#include <linux/ftrace.h>
#include <linux/reboot.h>
#include <linux/delay.h>
+#include <linux/ipipe_trace.h>
#include <linux/kexec.h>
#include <linux/sched.h>
+#include <linux/ipipe.h>
#include <linux/sysrq.h>
#include <linux/init.h>
#include <linux/nmi.h>
@@ -577,6 +579,8 @@ void oops_enter(void)
{
tracing_off();
/* can't trust the integrity of the kernel anymore: */
+ ipipe_trace_panic_freeze();
+ ipipe_disable_context_check();
debug_locks_off();
do_oops_enter_exit();
}
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index f8934f9746e6..6145079049dd 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -290,6 +290,7 @@ static int create_image(int platform_mode)
goto Enable_cpus;
local_irq_disable();
+ hard_cond_local_irq_disable();
system_state = SYSTEM_SUSPEND;
@@ -457,6 +458,7 @@ static int resume_target_kernel(bool platform_mode)
local_irq_disable();
system_state = SYSTEM_SUSPEND;
+ hard_cond_local_irq_disable();
error = syscore_suspend();
if (error)
@@ -578,6 +580,7 @@ int hibernation_platform_enter(void)
local_irq_disable();
system_state = SYSTEM_SUSPEND;
+ hard_cond_local_irq_disable();
syscore_suspend();
if (pm_wakeup_pending()) {
error = -EAGAIN;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bb2198b40756..96d1b72c4bb8 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -40,6 +40,7 @@
#include <linux/kmsg_dump.h>
#include <linux/syslog.h>
#include <linux/cpu.h>
+#include <linux/ipipe.h>
#include <linux/rculist.h>
#include <linux/poll.h>
#include <linux/irq_work.h>
@@ -2033,10 +2034,116 @@ asmlinkage int vprintk_emit(int facility, int level,
}
EXPORT_SYMBOL(vprintk_emit);
-asmlinkage int vprintk(const char *fmt, va_list args)
+#ifdef CONFIG_IPIPE
+
+extern int __ipipe_printk_bypass;
+
+static IPIPE_DEFINE_SPINLOCK(__ipipe_printk_lock);
+
+static int __ipipe_printk_fill;
+
+static char __ipipe_printk_buf[__LOG_BUF_LEN];
+
+int __ipipe_log_printk(const char *fmt, va_list args)
+{
+ int ret = 0, fbytes, oldcount;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&__ipipe_printk_lock, flags);
+
+ oldcount = __ipipe_printk_fill;
+ fbytes = __LOG_BUF_LEN - oldcount;
+ if (fbytes > 1) {
+ ret = vscnprintf(__ipipe_printk_buf + __ipipe_printk_fill,
+ fbytes, fmt, args) + 1;
+ __ipipe_printk_fill += ret;
+ }
+
+ raw_spin_unlock_irqrestore(&__ipipe_printk_lock, flags);
+
+ if (oldcount == 0)
+ ipipe_raise_irq(__ipipe_printk_virq);
+
+ return ret;
+}
+
+static void do_deferred_vprintk(const char *fmt, ...)
+{
+ va_list args;
+
+ va_start(args, fmt);
+ vprintk_func(fmt, args);
+ va_end(args);
+}
+
+void __ipipe_flush_printk(unsigned virq, void *cookie)
+{
+ char *p = __ipipe_printk_buf;
+ int len, lmax, out = 0;
+ unsigned long flags;
+
+ goto start;
+ do {
+ raw_spin_unlock_irqrestore(&__ipipe_printk_lock, flags);
+start:
+ lmax = __ipipe_printk_fill;
+ while (out < lmax) {
+ len = strlen(p) + 1;
+ do_deferred_vprintk("%s", p);
+ p += len;
+ out += len;
+ }
+ raw_spin_lock_irqsave(&__ipipe_printk_lock, flags);
+ } while (__ipipe_printk_fill != lmax);
+
+ __ipipe_printk_fill = 0;
+
+ raw_spin_unlock_irqrestore(&__ipipe_printk_lock, flags);
+}
+
+static int do_vprintk(const char *fmt, va_list args)
+{
+ int sprintk = 1, cs = -1;
+ unsigned long flags;
+ int ret;
+
+ flags = hard_local_irq_save();
+
+ if (__ipipe_printk_bypass || oops_in_progress)
+ cs = ipipe_disable_context_check();
+ else if (__ipipe_current_domain == ipipe_root_domain) {
+ if (ipipe_head_domain != ipipe_root_domain &&
+ (raw_irqs_disabled_flags(flags) ||
+ test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status)))
+ sprintk = 0;
+ } else
+ sprintk = 0;
+
+ hard_local_irq_restore(flags);
+
+ if (sprintk) {
+ ret = vprintk_func(fmt, args);
+ if (cs != -1)
+ ipipe_restore_context_check(cs);
+ } else
+ ret = __ipipe_log_printk(fmt, args);
+
+ return ret;
+}
+
+#else /* !CONFIG_IPIPE */
+
+static int do_vprintk(const char *fmt, va_list args)
{
return vprintk_func(fmt, args);
}
+
+#endif /* !CONFIG_IPIPE */
+
+asmlinkage int vprintk(const char *fmt, va_list args)
+{
+ return do_vprintk(fmt, args);
+}
EXPORT_SYMBOL(vprintk);
int vprintk_default(const char *fmt, va_list args)
@@ -2083,7 +2190,7 @@ asmlinkage __visible int printk(const char *fmt, ...)
int r;
va_start(args, fmt);
- r = vprintk_func(fmt, args);
+ r = do_vprintk(fmt, args);
va_end(args);
return r;
@@ -2144,6 +2251,63 @@ asmlinkage __visible void early_printk(const char *fmt, ...)
}
#endif
+#ifdef CONFIG_RAW_PRINTK
+static struct console *raw_console;
+static IPIPE_DEFINE_RAW_SPINLOCK(raw_console_lock);
+
+void raw_vprintk(const char *fmt, va_list ap)
+{
+ unsigned long flags;
+ char buf[256];
+ int n;
+
+ if (raw_console == NULL || console_suspended)
+ return;
+
+ n = vscnprintf(buf, sizeof(buf), fmt, ap);
+ touch_nmi_watchdog();
+ raw_spin_lock_irqsave(&raw_console_lock, flags);
+ if (raw_console)
+ raw_console->write_raw(raw_console, buf, n);
+ raw_spin_unlock_irqrestore(&raw_console_lock, flags);
+}
+
+asmlinkage __visible void raw_printk(const char *fmt, ...)
+{
+ va_list ap;
+
+ va_start(ap, fmt);
+ raw_vprintk(fmt, ap);
+ va_end(ap);
+}
+EXPORT_SYMBOL(raw_printk);
+
+static inline void register_raw_console(struct console *newcon)
+{
+ if ((newcon->flags & CON_RAW) != 0 && newcon->write_raw)
+ raw_console = newcon;
+}
+
+static inline void unregister_raw_console(struct console *oldcon)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&raw_console_lock, flags);
+ if (oldcon == raw_console)
+ raw_console = NULL;
+ raw_spin_unlock_irqrestore(&raw_console_lock, flags);
+}
+
+#else
+
+static inline void register_raw_console(struct console *newcon)
+{ }
+
+static inline void unregister_raw_console(struct console *oldcon)
+{ }
+
+#endif
+
static int __add_preferred_console(char *name, int idx, char *options,
char *brl_options)
{
@@ -2801,6 +2965,9 @@ void register_console(struct console *newcon)
console_drivers->next = newcon;
}
+ /* The latest raw console to register is current. */
+ register_raw_console(newcon);
+
if (newcon->flags & CON_EXTENDED)
nr_ext_console_drivers++;
@@ -2860,6 +3027,8 @@ int unregister_console(struct console *console)
(console->flags & CON_BOOT) ? "boot" : "" ,
console->name, console->index);
+ unregister_raw_console(console);
+
res = _braille_unregister_console(console);
if (res)
return res;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index aab480e24bd6..7b6228ab8878 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -862,6 +862,8 @@ static int ptrace_resume(struct task_struct *child, long request,
user_disable_single_step(child);
}
+ __ipipe_report_ptrace_resume(child, request);
+
/*
* Change ->exit_code and ->state under siglock to avoid the race
* with wait_task_stopped() in between; a non-zero ->exit_code will
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 4aa02eee8f6c..97216dbbdeee 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -6,7 +6,7 @@
menu "RCU Debugging"
config PROVE_RCU
- def_bool PROVE_LOCKING
+ def_bool PROVE_LOCKING && !IPIPE
config PROVE_RCU_LIST
bool "RCU list lockdep debugging"
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8ab239fd1c8d..57b55fae4ffb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1770,8 +1770,12 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
}
/* Can the task run on the task's current CPU? If so, we're done */
- if (cpumask_test_cpu(task_cpu(p), new_mask))
+ if (cpumask_test_cpu(task_cpu(p), new_mask)) {
+ __ipipe_report_setaffinity(p, task_cpu(p));
goto out;
+ }
+
+ __ipipe_report_setaffinity(p, dest_cpu);
if (task_running(rq, p) || p->state == TASK_WAKING) {
struct migration_arg arg = { p, dest_cpu };
@@ -2429,7 +2433,9 @@ void scheduler_ipi(void)
* however a fair share of IPIs are still resched only so this would
* somewhat pessimize the simple resched case.
*/
+#ifndef IPIPE_ARCH_HAVE_VIRQ_IPI
irq_enter();
+#endif
sched_ttwu_pending();
/*
@@ -2439,7 +2445,9 @@ void scheduler_ipi(void)
this_rq()->idle_balance = 1;
raise_softirq_irqoff(SCHED_SOFTIRQ);
}
+#ifndef IPIPE_ARCH_HAVE_VIRQ_IPI
irq_exit();
+#endif
}
static void ttwu_queue_remote(struct task_struct *p, int cpu, int wake_flags)
@@ -2648,7 +2656,7 @@ try_to_wake_up(struct task_struct *p, un
*/
raw_spin_lock_irqsave(&p->pi_lock, flags);
smp_mb__after_spinlock();
- if (!(p->state & state))
+ if (!(p->state & state) || (p->state & (TASK_NOWAKEUP|TASK_HARDENING)))
goto unlock;
trace_sched_waking(p);
@@ -3416,6 +3424,7 @@ asmlinkage __visible void schedule_tail(
* PREEMPT_COUNT kernels).
*/
+ __ipipe_complete_domain_migration();
rq = finish_task_switch(prev);
balance_callback(rq);
preempt_enable();
@@ -3429,10 +3438,21 @@ asmlinkage __visible void schedule_tail(
/*
* context_switch - switch to the new MM and the new thread's register state.
*/
-static __always_inline struct rq *
+struct rq *
context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next, struct rq_flags *rf)
{
+if (likely(!rq)) {
+ struct mm_struct *mm, *oldmm;
+ mm = next->mm;
+ oldmm = prev->active_mm;
+ switch_mm_irqs_off(oldmm, next->active_mm, next);
+ if (!mm) enter_lazy_tlb(oldmm, next);
+ smp_mb();
+ switch_to(prev, next, prev);
+ barrier();
+ return NULL;
+}
prepare_task_switch(rq, prev, next);
/*
@@ -3484,8 +3504,12 @@ context_switch(struct rq *rq, struct tas
switch_to(prev, next, prev);
barrier();
+ if (unlikely(__ipipe_switch_tail()))
+ return NULL;
+
return finish_task_switch(prev);
}
+EXPORT_SYMBOL(context_switch);
/*
* nr_running and nr_context_switches:
@@ -3976,6 +4000,7 @@ static noinline void __schedule_bug(stru
*/
static inline void schedule_debug(struct task_struct *prev, bool preempt)
{
+ ipipe_root_only();
#ifdef CONFIG_SCHED_STACK_END_CHECK
if (task_stack_end_corrupted(prev))
panic("corrupted stack end detected inside scheduler\n");
@@ -4098,7 +4123,7 @@ restart:
*
* WARNING: must be called with preemption disabled!
*/
-static void __sched notrace __schedule(bool preempt)
+static bool __sched notrace __schedule(bool preempt)
{
struct task_struct *prev, *next;
unsigned long *switch_count;
@@ -4179,12 +4204,17 @@ static void __sched notrace __schedule(b
/* Also unlocks the rq: */
rq = context_switch(rq, prev, next, &rf);
+ if (rq == NULL)
+ return true; /* task hijacked by head domain */
} else {
+ prev->state &= ~TASK_HARDENING;
rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
rq_unlock_irq(rq, &rf);
}
balance_callback(rq);
+
+ return false;
}
void __noreturn do_task_dead(void)
@@ -4246,7 +4276,8 @@ asmlinkage __visible void __sched schedu
sched_submit_work(tsk);
do {
preempt_disable();
- __schedule(false);
+ if (__schedule(false))
+ return;
sched_preempt_enable_no_resched();
} while (need_resched());
sched_update_worker(tsk);
@@ -4327,7 +4358,8 @@ static void __sched notrace preempt_sche
*/
preempt_disable_notrace();
preempt_latency_start(1);
- __schedule(true);
+ if (__schedule(true))
+ return;
preempt_latency_stop(1);
preempt_enable_no_resched_notrace();
@@ -4349,7 +4381,7 @@ asmlinkage __visible void __sched notrac
* If there is a non-zero preempt_count or interrupts are disabled,
* we do not want to preempt the current task. Just return..
*/
- if (likely(!preemptible()))
+ if (likely(!preemptible() || !ipipe_root_p))
return;
preempt_schedule_common();
@@ -4375,7 +4407,7 @@ asmlinkage __visible void __sched notrac
{
enum ctx_state prev_ctx;
- if (likely(!preemptible()))
+ if (likely(!preemptible() || !ipipe_root_p || hard_irqs_disabled()))
return;
do {
@@ -5082,6 +5114,7 @@ change:
__setscheduler(rq, p, attr, pi);
__setscheduler_uclamp(p, attr);
+ __ipipe_report_setsched(p);
if (queued) {
/*
@@ -6642,6 +6675,43 @@ int in_sched_functions(unsigned long add
&& addr < (unsigned long)__sched_text_end);
}
+#ifdef CONFIG_IPIPE
+
+int __ipipe_migrate_head(void)
+{
+ struct task_struct *p = current;
+
+ preempt_disable();
+
+ IPIPE_WARN_ONCE(__this_cpu_read(ipipe_percpu.task_hijacked) != NULL);
+
+ __this_cpu_write(ipipe_percpu.task_hijacked, p);
+ set_current_state(TASK_INTERRUPTIBLE | TASK_HARDENING);
+ sched_submit_work(p);
+ if (likely(__schedule(false)))
+ return 0;
+
+ preempt_enable();
+ return -ERESTARTSYS;
+}
+EXPORT_SYMBOL_GPL(__ipipe_migrate_head);
+
+void __ipipe_reenter_root(void)
+{
+ struct rq *rq;
+ struct task_struct *p;
+
+ p = __this_cpu_read(ipipe_percpu.rqlock_owner);
+ BUG_ON(p == NULL);
+ ipipe_clear_thread_flag(TIP_HEAD);
+ rq = finish_task_switch(p);
+ balance_callback(rq);
+ preempt_enable_no_resched_notrace();
+}
+EXPORT_SYMBOL_GPL(__ipipe_reenter_root);
+
+#endif /* CONFIG_IPIPE */
+
#ifdef CONFIG_CGROUP_SCHED
/*
* Default task group.
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 3f8c7867c14c..dd09054ef430 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -77,22 +77,29 @@ void __weak arch_cpu_idle_dead(void) { }
void __weak arch_cpu_idle(void)
{
cpu_idle_force_poll = 1;
- local_irq_enable();
+ local_irq_enable_full();
}
/**
* default_idle_call - Default CPU idle routine.
*
* To use when the cpuidle framework cannot be used.
+ *
+ * When interrupts are pipelined, this call is entered with hard irqs
+ * on and the root stage stalled, returns with hard irqs on, and the
+ * root stage unstalled.
*/
void __cpuidle default_idle_call(void)
{
if (current_clr_polling_and_test()) {
- local_irq_enable();
+ local_irq_enable_full();
} else {
- stop_critical_timings();
- arch_cpu_idle();
- start_critical_timings();
+ if (ipipe_enter_cpuidle(NULL, NULL)) {
+ stop_critical_timings();
+ arch_cpu_idle();
+ start_critical_timings();
+ } else
+ local_irq_enable_full();
}
}
@@ -208,6 +215,15 @@ static void cpuidle_idle_call(void)
exit_idle:
__current_set_polling();
+#ifdef CONFIG_IPIPE
+ /*
+ * Catch mishandling of the CPU's interrupt disable flag when
+ * pipelining IRQs.
+ */
+ if (WARN_ON_ONCE(hard_irqs_disabled()))
+ hard_local_irq_enable();
+#endif
+
/*
* It is up to the idle functions to reenable local interrupts
*/
@@ -262,6 +278,9 @@ static void do_idle(void)
cpu_idle_poll();
} else {
cpuidle_idle_call();
+#ifdef CONFIG_IPIPE
+ WARN_ON_ONCE(hard_irqs_disabled());
+#endif
}
arch_cpu_idle_exit();
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b8a3db59e326..131ac6fd5984 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -65,6 +65,7 @@
#include <linux/syscalls.h>
#include <linux/task_work.h>
#include <linux/tsacct_kern.h>
+#include <linux/ipipe.h>
#include <asm/tlb.h>
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 7d668b31dbc6..fc20cee8e587 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -80,6 +80,8 @@ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
} else
curr = list_first_entry(&wq_head->head, wait_queue_entry_t, entry);
+ ipipe_root_only();
+
if (&curr->entry == &wq_head->head)
return nr_exclusive;
diff --git a/kernel/signal.c b/kernel/signal.c
index 1f4293a107b4..e46e3d88d0b4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -35,6 +35,7 @@
#include <linux/tracehook.h>
#include <linux/capability.h>
#include <linux/freezer.h>
+#include <linux/ipipe.h>
#include <linux/pid_namespace.h>
#include <linux/nsproxy.h>
#include <linux/user_namespace.h>
@@ -760,6 +761,10 @@ still_pending:
void signal_wake_up_state(struct task_struct *t, unsigned int state)
{
set_tsk_thread_flag(t, TIF_SIGPENDING);
+
+ /* TIF_SIGPENDING must be prior to reporting. */
+ __ipipe_report_sigwake(t);
+
/*
* TASK_WAKEKILL also means wake it up in the stopped/traced/killable
* case. We don't check t->state here because there is a race with it
@@ -981,8 +986,11 @@ static inline bool wants_signal(int sig, struct task_struct *p)
if (sig == SIGKILL)
return true;
- if (task_is_stopped_or_traced(p))
+ if (task_is_stopped_or_traced(p)) {
+ if (!signal_pending(p))
+ __ipipe_report_sigwake(p);
return false;
+ }
return task_curr(p) || !signal_pending(p);
}
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 998d50ee2d9b..d93f9c9d63d1 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -237,6 +237,7 @@ static int multi_cpu_stop(void *data)
}
} while (curstate != MULTI_STOP_EXIT);
+ hard_irq_enable();
local_irq_restore(flags);
return err;
}
@@ -618,6 +619,7 @@ int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data,
local_irq_save(flags);
hard_irq_disable();
ret = (*fn)(data);
+ hard_irq_enable();
local_irq_restore(flags);
return ret;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index f5490222e134..ba06c41e901e 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -13,6 +13,7 @@
#include <linux/module.h>
#include <linux/smp.h>
#include <linux/device.h>
+#include <linux/ipipe_tickdev.h>
#include "tick-internal.h"
@@ -448,6 +449,8 @@ void clockevents_register_device(struct clock_event_device *dev)
/* Initialize state to DETACHED */
clockevent_set_state(dev, CLOCK_EVT_STATE_DETACHED);
+ ipipe_host_timer_register(dev);
+
if (!dev->cpumask) {
WARN_ON(num_possible_cpus() > 1);
dev->cpumask = cpumask_of(smp_processor_id());
@@ -642,8 +645,10 @@ void tick_cleanup_dead_cpu(int cpu)
* Unregister the clock event devices which were
* released from the users in the notify chain.
*/
- list_for_each_entry_safe(dev, tmp, &clockevents_released, list)
+ list_for_each_entry_safe(dev, tmp, &clockevents_released, list) {
list_del(&dev->list);
+ ipipe_host_timer_cleanup(dev);
+ }
/*
* Now check whether the CPU has left unused per cpu devices
*/
@@ -653,6 +658,7 @@ void tick_cleanup_dead_cpu(int cpu)
!tick_is_broadcast_device(dev)) {
BUG_ON(!clockevent_state_detached(dev));
list_del(&dev->list);
+ ipipe_host_timer_cleanup(dev);
}
}
raw_spin_unlock_irqrestore(&clockevents_lock, flags);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 16a2b62f5f74..2767f08bbe43 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -21,6 +21,7 @@
#include <linux/kernel_stat.h>
#include <linux/export.h>
#include <linux/interrupt.h>
+#include <linux/ipipe.h>
#include <linux/percpu.h>
#include <linux/init.h>
#include <linux/mm.h>
@@ -1726,6 +1727,15 @@ static inline int collect_expired_timers(struct timer_base *base,
}
#endif
+static inline void do_account_tick(struct task_struct *p, int user_tick)
+{
+#ifdef CONFIG_IPIPE
+ if (!__ipipe_root_tick_p(raw_cpu_ptr(&ipipe_percpu.tick_regs)))
+ return;
+#endif
+ account_process_tick(p, user_tick);
+}
+
/*
* Called from the timer interrupt handler to charge one tick to the current
* process. user_tick is 1 if the tick is user time, 0 for system.
@@ -1735,7 +1745,7 @@ void update_process_times(int user_tick)
struct task_struct *p = current;
/* Note: this timer irq context must be accounted for as well. */
- account_process_tick(p, user_tick);
+ do_account_tick(p, user_tick);
run_local_timers();
rcu_sched_clock_irq(user_tick);
#ifdef CONFIG_IRQ_WORK
diff --git a/kernel/time/vsyscall.c b/kernel/time/vsyscall.c
index 9577c89179cd..b51b410c92ca 100644
--- a/kernel/time/vsyscall.c
+++ b/kernel/time/vsyscall.c
@@ -9,6 +9,7 @@
#include <linux/hrtimer.h>
#include <linux/timekeeper_internal.h>
+#include <linux/ipipe_tickdev.h>
#include <vdso/datapage.h>
#include <vdso/helpers.h>
#include <vdso/vsyscall.h>
@@ -73,6 +74,8 @@ void update_vsyscall(struct timekeeper *tk)
struct vdso_timestamp *vdso_ts;
u64 nsec;
+ ipipe_update_hostrt(tk);
+
/* copy vsyscall data */
vdso_write_begin(vdata);
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 9fa01dad655b..f0360083525e 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -532,6 +532,7 @@ config DYNAMIC_FTRACE
bool "enable/disable function tracing dynamically"
depends on FUNCTION_TRACER
depends on HAVE_DYNAMIC_FTRACE
+ depends on !IPIPE
default y
help
This option will modify all the calls to function tracing
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8e3c76dcc0ff..e4db33df6f66 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -35,6 +35,7 @@
#include <linux/hash.h>
#include <linux/rcupdate.h>
#include <linux/kprobes.h>
+#include <linux/ipipe.h>
#include <trace/events/sched.h>
@@ -184,8 +185,17 @@ static ftrace_func_t ftrace_ops_get_list_func(struct ftrace_ops *ops)
static void update_ftrace_function(void)
{
+ struct ftrace_ops *ops;
ftrace_func_t func;
+ for (ops = ftrace_ops_list;
+ ops != &ftrace_list_end; ops = ops->next)
+ if (ops->flags & FTRACE_OPS_FL_IPIPE_EXCLUSIVE) {
+ set_function_trace_op = ops;
+ func = ops->func;
+ goto set_pointers;
+ }
+
/*
* Prepare the ftrace_ops that the arch callback will use.
* If there's only one ftrace_ops registered, the ftrace_ops_list
@@ -215,6 +225,7 @@ static void update_ftrace_function(void)
update_function_graph_func();
+ set_pointers:
/* If there's no change, then do nothing more here */
if (ftrace_trace_function == func)
return;
@@ -2639,6 +2650,9 @@ void __weak arch_ftrace_update_code(int command)
static void ftrace_run_update_code(int command)
{
+#ifdef CONFIG_IPIPE
+ unsigned long flags;
+#endif /* CONFIG_IPIPE */
int ret;
ret = ftrace_arch_code_modify_prepare();
@@ -5704,10 +5718,10 @@ static int ftrace_process_locs(struct module *mod,
* reason to cause large interrupt latencies while we do it.
*/
if (!mod)
- local_irq_save(flags);
+ flags = hard_local_irq_save();
ftrace_update_code(mod, start_pg);
if (!mod)
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
ret = 0;
out:
mutex_unlock(&ftrace_lock);
@@ -6248,9 +6262,11 @@ void __init ftrace_init(void)
unsigned long count, flags;
int ret;
- local_irq_save(flags);
+ flags = hard_local_irq_save_notrace();
ret = ftrace_dyn_arch_init();
- local_irq_restore(flags);
+ hard_local_irq_restore_notrace(flags);
+
+ /* ftrace_dyn_arch_init places the return code in addr */
if (ret)
goto failed;
@@ -6385,7 +6401,16 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
}
} while_for_each_ftrace_op(op);
out:
- preempt_enable_notrace();
+#ifdef CONFIG_IPIPE
+ if (hard_irqs_disabled() || !ipipe_root_p)
+ /*
+ * Nothing urgent to schedule here. At latest the timer tick
+ * will pick up whatever the tracing functions kicked off.
+ */
+ preempt_enable_no_resched_notrace();
+ else
+#endif
+ preempt_enable_notrace();
trace_clear_recursion(bit);
}
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 58809fffc817..c1b7cc8f0153 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2761,8 +2761,9 @@ rb_wakeups(struct ring_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
static __always_inline int
trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
{
- unsigned int val = cpu_buffer->current_context;
unsigned long pc = preempt_count();
+ unsigned long flags;
+ unsigned int val;
int bit;
if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
@@ -2771,6 +2772,10 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
bit = pc & NMI_MASK ? RB_CTX_NMI :
pc & HARDIRQ_MASK ? RB_CTX_IRQ : RB_CTX_SOFTIRQ;
+ flags = hard_local_irq_save();
+
+ val = cpu_buffer->current_context;
+
if (unlikely(val & (1 << (bit + cpu_buffer->nest)))) {
/*
* It is possible that this was called by transitioning
@@ -2778,21 +2783,29 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
* been updated yet. In this case, use the TRANSITION bit.
*/
bit = RB_CTX_TRANSITION;
- if (val & (1 << (bit + cpu_buffer->nest)))
+ if (val & (1 << (bit + cpu_buffer->nest))) {
+ hard_local_irq_restore(flags);
return 1;
+ }
}
val |= (1 << (bit + cpu_buffer->nest));
cpu_buffer->current_context = val;
+ hard_local_irq_restore(flags);
+
return 0;
}
static __always_inline void
trace_recursive_unlock(struct ring_buffer_per_cpu *cpu_buffer)
{
+ unsigned long flags;
+
+ flags = hard_local_irq_save();
cpu_buffer->current_context &=
cpu_buffer->current_context - (1 << cpu_buffer->nest);
+ hard_local_irq_restore(flags);
}
/* The recursive locking above uses 5 bits */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 306fbe14747b..fe49abbdf6f8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3151,8 +3151,9 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
/* Don't pollute graph traces with trace_vprintk internals */
pause_graph_tracing();
+ flags = hard_local_irq_save();
+
pc = preempt_count();
- preempt_disable_notrace();
tbuffer = get_trace_buf();
if (!tbuffer) {
@@ -3165,7 +3166,6 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
if (len > TRACE_BUF_SIZE/sizeof(int) || len < 0)
goto out;
- local_save_flags(flags);
size = sizeof(*entry) + sizeof(u32) * len;
buffer = tr->trace_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_BPRINT, size,
@@ -3186,7 +3186,7 @@ out:
put_trace_buf();
out_nobuffer:
- preempt_enable_notrace();
+ hard_local_irq_restore(flags);
unpause_graph_tracing();
return len;
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 4702efb00ff2..519a8d8f4e2e 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -97,7 +97,7 @@ u64 notrace trace_clock_global(void)
int this_cpu;
u64 now, prev_time;
- raw_local_irq_save(flags);
+ flags = hard_local_irq_save_notrace();
this_cpu = raw_smp_processor_id();
@@ -139,7 +139,7 @@ u64 notrace trace_clock_global(void)
arch_spin_unlock(&trace_clock_struct.lock);
}
out:
- raw_local_irq_restore(flags);
+ hard_local_irq_restore_notrace(flags);
return now;
}
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 4e8acfe3437f..950a5905f97a 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -190,7 +190,7 @@ function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
* Need to use raw, since this must be called before the
* recursive protection is performed.
*/
- local_irq_save(flags);
+ flags = hard_local_irq_save();
cpu = raw_smp_processor_id();
data = per_cpu_ptr(tr->trace_buffer.data, cpu);
disabled = atomic_inc_return(&data->disabled);
@@ -202,7 +202,7 @@ function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
}
atomic_dec(&data->disabled);
- local_irq_restore(flags);
+ hard_local_irq_restore(flags);
}
static struct tracer_opt func_opts[] = {
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 78af97163147..1acb41d810cd 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -169,7 +169,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
if (tracing_thresh)
return 1;
- local_irq_save(flags);
+ flags = hard_local_irq_save_notrace();
cpu = raw_smp_processor_id();
data = per_cpu_ptr(tr->trace_buffer.data, cpu);
disabled = atomic_inc_return(&data->disabled);
@@ -181,7 +181,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
}
atomic_dec(&data->disabled);
- local_irq_restore(flags);
+ hard_local_irq_restore_notrace(flags);
return ret;
}
@@ -250,7 +250,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
return;
}
- local_irq_save(flags);
+ flags = hard_local_irq_save_notrace();
cpu = raw_smp_processor_id();
data = per_cpu_ptr(tr->trace_buffer.data, cpu);
disabled = atomic_inc_return(&data->disabled);
@@ -259,7 +259,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
__trace_graph_return(tr, trace, flags, pc);
}
atomic_dec(&data->disabled);
- local_irq_restore(flags);
+ hard_local_irq_restore_notrace(flags);
}
void set_graph_array(struct trace_array *tr)
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index e9645f829b94..9451c1648c46 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -21,6 +21,9 @@ static DEFINE_PER_CPU(int, tracing_irq_cpu);
void trace_hardirqs_on(void)
{
+ if (!ipipe_root_p)
+ return;
+
if (this_cpu_read(tracing_irq_cpu)) {
if (!in_nmi())
trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
@@ -35,6 +38,9 @@ NOKPROBE_SYMBOL(trace_hardirqs_on);
void trace_hardirqs_off(void)
{
+ if (!ipipe_root_p)
+ return;
+
if (!this_cpu_read(tracing_irq_cpu)) {
this_cpu_write(tracing_irq_cpu, 1);
tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
@@ -49,6 +55,9 @@ NOKPROBE_SYMBOL(trace_hardirqs_off);
__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
{
+ if (!ipipe_root_p)
+ return;
+
if (this_cpu_read(tracing_irq_cpu)) {
if (!in_nmi())
trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
@@ -61,8 +70,33 @@ __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
EXPORT_SYMBOL(trace_hardirqs_on_caller);
NOKPROBE_SYMBOL(trace_hardirqs_on_caller);
+__visible void trace_hardirqs_on_virt_caller(unsigned long ip)
+{
+ /*
+ * The IRQ tracing logic only applies to the root domain, and
+ * must consider the virtual disable flag exclusively when
+ * leaving an interrupt/fault context.
+ */
+ if (ipipe_root_p && !irqs_disabled())
+ trace_hardirqs_on_caller(ip);
+}
+
+__visible void trace_hardirqs_on_virt(void)
+{
+ /*
+ * The IRQ tracing logic only applies to the root domain, and
+ * must consider the virtual disable flag exclusively when
+ * leaving an interrupt/fault context.
+ */
+ if (ipipe_root_p && !irqs_disabled())
+ trace_hardirqs_on_caller(CALLER_ADDR0);
+}
+
__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
{
+ if (!ipipe_root_p)
+ return;
+
lockdep_hardirqs_off(caller_addr);
if (!this_cpu_read(tracing_irq_cpu)) {
@@ -80,14 +114,14 @@ NOKPROBE_SYMBOL(trace_hardirqs_off_caller);
void trace_preempt_on(unsigned long a0, unsigned long a1)
{
- if (!in_nmi())
+ if (ipipe_root_p && !in_nmi())
trace_preempt_enable_rcuidle(a0, a1);
tracer_preempt_on(a0, a1);
}
void trace_preempt_off(unsigned long a0, unsigned long a1)
{
- if (!in_nmi())
+ if (ipipe_root_p && !in_nmi())
trace_preempt_disable_rcuidle(a0, a1);
tracer_preempt_off(a0, a1);
}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6d79e7c3219c..4a441c8e615b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -408,6 +408,7 @@ config MAGIC_SYSRQ
keys are documented in <file:Documentation/admin-guide/sysrq.rst>.
Don't say Y unless you really know what this hack does.
+
config MAGIC_SYSRQ_DEFAULT_ENABLE
hex "Enable magic SysRq key functions by default"
depends on MAGIC_SYSRQ
@@ -427,6 +428,8 @@ config MAGIC_SYSRQ_SERIAL
This option allows you to decide whether you want to enable the
magic SysRq key.
+source "kernel/ipipe/Kconfig.debug"
+
config DEBUG_KERNEL
bool "Kernel debugging"
help
diff --git a/lib/atomic64.c b/lib/atomic64.c
index e98c85a99787..9676c025cd53 100644
--- a/lib/atomic64.c
+++ b/lib/atomic64.c
@@ -25,15 +25,15 @@
* Ensure each lock is in a separate cacheline.
*/
static union {
- raw_spinlock_t lock;
+ ipipe_spinlock_t lock;
char pad[L1_CACHE_BYTES];
} atomic64_lock[NR_LOCKS] __cacheline_aligned_in_smp = {
[0 ... (NR_LOCKS - 1)] = {
- .lock = __RAW_SPIN_LOCK_UNLOCKED(atomic64_lock.lock),
+ .lock = IPIPE_SPIN_LOCK_UNLOCKED,
},
};
-static inline raw_spinlock_t *lock_addr(const atomic64_t *v)
+static inline ipipe_spinlock_t *lock_addr(const atomic64_t *v)
{
unsigned long addr = (unsigned long) v;
@@ -45,7 +45,7 @@ static inline raw_spinlock_t *lock_addr(const atomic64_t *v)
s64 atomic64_read(const atomic64_t *v)
{
unsigned long flags;
- raw_spinlock_t *lock = lock_addr(v);
+ ipipe_spinlock_t *lock = lock_addr(v);
s64 val;
raw_spin_lock_irqsave(lock, flags);
@@ -58,7 +58,7 @@ EXPORT_SYMBOL(atomic64_read);
void atomic64_set(atomic64_t *v, s64 i)
{
unsigned long flags;
- raw_spinlock_t *lock = lock_addr(v);
+ ipipe_spinlock_t *lock = lock_addr(v);
raw_spin_lock_irqsave(lock, flags);
v->counter = i;
@@ -70,7 +70,7 @@ EXPORT_SYMBOL(atomic64_set);
void atomic64_##op(s64 a, atomic64_t *v) \
{ \
unsigned long flags; \
- raw_spinlock_t *lock = lock_addr(v); \
+ ipipe_spinlock_t *lock = lock_addr(v); \
\
raw_spin_lock_irqsave(lock, flags); \
v->counter c_op a; \
@@ -82,7 +82,7 @@ EXPORT_SYMBOL(atomic64_##op);
s64 atomic64_##op##_return(s64 a, atomic64_t *v) \
{ \
unsigned long flags; \
- raw_spinlock_t *lock = lock_addr(v); \
+ ipipe_spinlock_t *lock = lock_addr(v); \
s64 val; \
\
raw_spin_lock_irqsave(lock, flags); \
@@ -96,7 +96,7 @@ EXPORT_SYMBOL(atomic64_##op##_return);
s64 atomic64_fetch_##op(s64 a, atomic64_t *v) \
{ \
unsigned long flags; \
- raw_spinlock_t *lock = lock_addr(v); \
+ ipipe_spinlock_t *lock = lock_addr(v); \
s64 val; \
\
raw_spin_lock_irqsave(lock, flags); \
@@ -133,7 +133,7 @@ ATOMIC64_OPS(xor, ^=)
s64 atomic64_dec_if_positive(atomic64_t *v)
{
unsigned long flags;
- raw_spinlock_t *lock = lock_addr(v);
+ ipipe_spinlock_t *lock = lock_addr(v);
s64 val;
raw_spin_lock_irqsave(lock, flags);
@@ -148,7 +148,7 @@ EXPORT_SYMBOL(atomic64_dec_if_positive);
s64 atomic64_cmpxchg(atomic64_t *v, s64 o, s64 n)
{
unsigned long flags;
- raw_spinlock_t *lock = lock_addr(v);
+ ipipe_spinlock_t *lock = lock_addr(v);
s64 val;
raw_spin_lock_irqsave(lock, flags);
@@ -163,7 +163,7 @@ EXPORT_SYMBOL(atomic64_cmpxchg);
s64 atomic64_xchg(atomic64_t *v, s64 new)
{
unsigned long flags;
- raw_spinlock_t *lock = lock_addr(v);
+ ipipe_spinlock_t *lock = lock_addr(v);
s64 val;
raw_spin_lock_irqsave(lock, flags);
@@ -177,7 +177,7 @@ EXPORT_SYMBOL(atomic64_xchg);
s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
{
unsigned long flags;
- raw_spinlock_t *lock = lock_addr(v);
+ ipipe_spinlock_t *lock = lock_addr(v);
s64 val;
raw_spin_lock_irqsave(lock, flags);
diff --git a/lib/bust_spinlocks.c b/lib/bust_spinlocks.c
index 8be59f84eaea..812d63bd8e66 100644
--- a/lib/bust_spinlocks.c
+++ b/lib/bust_spinlocks.c
@@ -16,6 +16,7 @@
#include <linux/wait.h>
#include <linux/vt_kern.h>
#include <linux/console.h>
+#include <linux/ipipe_trace.h>
void bust_spinlocks(int yes)
{
diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 33ffbf308853..c20f90a9e490 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -8,6 +8,7 @@
#include <linux/export.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
+#include <linux/ipipe.h>
#include <linux/smp.h>
#include <linux/atomic.h>
#include <linux/kexec.h>
@@ -56,6 +57,9 @@ void dump_stack_print_info(const char *log_lvl)
printk("%sHardware name: %s\n",
log_lvl, dump_stack_arch_desc_str);
+#ifdef CONFIG_IPIPE
+ printk("I-pipe domain: %s\n", ipipe_current_domain->name);
+#endif
print_worker_info(log_lvl, current);
}
@@ -85,6 +89,29 @@ static void __dump_stack(void)
#ifdef CONFIG_SMP
static atomic_t dump_lock = ATOMIC_INIT(-1);
+static unsigned long disable_local_irqs(void)
+{
+ unsigned long flags = 0; /* only to trick the UMR detection */
+
+ /*
+ * We neither need nor want to disable root stage IRQs over
+ * the head stage, where CPU migration can't
+ * happen. Conversely, we neither need nor want to disable
+ * hard IRQs from the head stage, so that latency won't
+ * skyrocket as a result of dumping the stack backtrace.
+ */
+ if (ipipe_root_p)
+ local_irq_save(flags);
+
+ return flags;
+}
+
+static void restore_local_irqs(unsigned long flags)
+{
+ if (ipipe_root_p)
+ local_irq_restore(flags);
+}
+
asmlinkage __visible void dump_stack(void)
{
unsigned long flags;
@@ -97,7 +124,7 @@ asmlinkage __visible void dump_stack(void)
* against other CPUs
*/
retry:
- local_irq_save(flags);
+ flags = disable_local_irqs();
cpu = smp_processor_id();
old = atomic_cmpxchg(&dump_lock, -1, cpu);
if (old == -1) {
@@ -105,7 +132,7 @@ retry:
} else if (old == cpu) {
was_locked = 1;
} else {
- local_irq_restore(flags);
+ restore_local_irqs(flags);
/*
* Wait for the lock to release before jumping to
* atomic_cmpxchg() in order to mitigate the thundering herd
@@ -120,7 +147,7 @@ retry:
if (!was_locked)
atomic_set(&dump_lock, -1);
- local_irq_restore(flags);
+ restore_local_irqs(flags);
}
#else
asmlinkage __visible void dump_stack(void)
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 0a2ffadc6d71..b45416b23760 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -11,6 +11,7 @@
#include <linux/sched.h>
#include <linux/io.h>
#include <linux/export.h>
+#include <linux/hardirq.h>
#include <asm/cacheflush.h>
#include <asm/pgtable.h>
@@ -227,7 +228,12 @@ int ioremap_page_range(unsigned long addr,
break;
} while (pgd++, phys_addr += (next - addr), addr = next, addr != end);
- flush_cache_vmap(start, end);
+ /* APEI may invoke this for temporarily remapping pages in interrupt
+ * context - nothing we can and need to propagate globally. */
+ if (!in_interrupt()) {
+ __ipipe_pin_mapping_globally(start, end);
+ flush_cache_vmap(start, end);
+ }
return err;
}
diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 60ba93fc42ce..a5cefa08500e 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -7,12 +7,19 @@
#include <linux/export.h>
#include <linux/kprobes.h>
#include <linux/sched.h>
+#include <linux/ipipe.h>
notrace static nokprobe_inline
unsigned int check_preemption_disabled(const char *what1, const char *what2)
{
int this_cpu = raw_smp_processor_id();
+ if (hard_irqs_disabled())
+ goto out;
+
+ if (!ipipe_root_p)
+ goto out;
+
if (likely(preempt_count()))
goto out;
diff --git a/mm/memory.c b/mm/memory.c
index d416e329442d..16a1de40b5ea 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -56,6 +56,7 @@
#include <linux/export.h>
#include <linux/delayacct.h>
#include <linux/init.h>
+#include <linux/ipipe.h>
#include <linux/pfn_t.h>
#include <linux/writeback.h>
#include <linux/memcontrol.h>
@@ -142,6 +143,9 @@ EXPORT_SYMBOL(zero_pfn);
unsigned long highest_memmap_pfn __read_mostly;
+static bool cow_user_page(struct page *dst, struct page *src,
+ struct vm_fault *vmf);
+
/*
* CONFIG_MMU architectures set up ZERO_PAGE in their paging_init()
*/
@@ -688,8 +692,9 @@ out:
static inline unsigned long
copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
- pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
- unsigned long addr, int *rss)
+ pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
+ unsigned long addr, int *rss, pmd_t *src_pmd,
+ struct page *uncow_page)
{
unsigned long vm_flags = vma->vm_flags;
pte_t pte = *src_pte;
@@ -767,6 +772,33 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* in the parent and the child
*/
if (is_cow_mapping(vm_flags) && pte_write(pte)) {
+#ifdef CONFIG_IPIPE
+ if (uncow_page) {
+ struct page *old_page = vm_normal_page(vma, addr, pte);
+ struct vm_fault vmf;
+
+ vmf.vma = vma;
+ vmf.address = addr;
+ vmf.orig_pte = pte;
+ vmf.pmd = src_pmd;
+
+ if (cow_user_page(uncow_page, old_page, &vmf)) {
+ pte = mk_pte(uncow_page, vma->vm_page_prot);
+
+ if (vm_flags & VM_SHARED)
+ pte = pte_mkclean(pte);
+ pte = pte_mkold(pte);
+
+ page_add_new_anon_rmap(uncow_page, vma, addr,
+ false);
+ rss[!!PageAnon(uncow_page)]++;
+ goto out_set_pte;
+ } else {
+ /* unexpected: source page no longer present */
+ WARN_ON_ONCE(1);
+ }
+ }
+#endif /* CONFIG_IPIPE */
ptep_set_wrprotect(src_mm, addr, src_pte);
pte = pte_wrprotect(pte);
}
@@ -803,13 +835,27 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
int progress = 0;
int rss[NR_MM_COUNTERS];
swp_entry_t entry = (swp_entry_t){0};
-
+ struct page *uncow_page = NULL;
+#ifdef CONFIG_IPIPE
+ int do_cow_break = 0;
+again:
+ if (do_cow_break) {
+ uncow_page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
+ if (uncow_page == NULL)
+ return -ENOMEM;
+ do_cow_break = 0;
+ }
+#else
again:
+#endif
init_rss_vec(rss);
dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
- if (!dst_pte)
+ if (!dst_pte) {
+ if (uncow_page)
+ put_page(uncow_page);
return -ENOMEM;
+ }
src_pte = pte_offset_map(src_pmd, addr);
src_ptl = pte_lockptr(src_mm, src_pmd);
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
@@ -832,8 +878,25 @@ again:
progress++;
continue;
}
+#ifdef CONFIG_IPIPE
+ if (likely(uncow_page == NULL) && likely(pte_present(*src_pte))) {
+ if (is_cow_mapping(vma->vm_flags) &&
+ test_bit(MMF_VM_PINNED, &src_mm->flags) &&
+ ((vma->vm_flags|src_mm->def_flags) & VM_LOCKED)) {
+ arch_leave_lazy_mmu_mode();
+ spin_unlock(src_ptl);
+ pte_unmap(src_pte);
+ add_mm_rss_vec(dst_mm, rss);
+ pte_unmap_unlock(dst_pte, dst_ptl);
+ cond_resched();
+ do_cow_break = 1;
+ goto again;
+ }
+ }
+#endif
entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
- vma, addr, rss);
+ vma, addr, rss, src_pmd, uncow_page);
+ uncow_page = NULL;
if (entry.val)
break;
progress += 8;
@@ -2181,8 +2244,8 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
return same;
}
-static inline bool cow_user_page(struct page *dst, struct page *src,
- struct vm_fault *vmf)
+static bool cow_user_page(struct page *dst, struct page *src,
+ struct vm_fault *vmf)
{
bool ret;
void *kaddr;
@@ -4809,6 +4872,41 @@ long copy_huge_page_from_user(struct page *dst_page,
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
+#ifdef CONFIG_IPIPE
+
+int __ipipe_disable_ondemand_mappings(struct task_struct *tsk)
+{
+ struct vm_area_struct *vma;
+ struct mm_struct *mm;
+ int result = 0;
+
+ mm = get_task_mm(tsk);
+ if (!mm)
+ return -EPERM;
+
+ down_write(&mm->mmap_sem);
+ if (test_bit(MMF_VM_PINNED, &mm->flags))
+ goto done_mm;
+
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ if (is_cow_mapping(vma->vm_flags) &&
+ (vma->vm_flags & VM_WRITE)) {
+ result = __ipipe_pin_vma(mm, vma);
+ if (result < 0)
+ goto done_mm;
+ }
+ }
+ set_bit(MMF_VM_PINNED, &mm->flags);
+
+ done_mm:
+ up_write(&mm->mmap_sem);
+ mmput(mm);
+ return result;
+}
+EXPORT_SYMBOL_GPL(__ipipe_disable_ondemand_mappings);
+
+#endif /* CONFIG_IPIPE */
+
#if USE_SPLIT_PTE_PTLOCKS && ALLOC_SPLIT_PTLOCKS
static struct kmem_cache *page_ptl_cachep;
diff --git a/mm/mlock.c b/mm/mlock.c
index a72c1eeded77..a10e73ab97ee 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -871,3 +871,29 @@ void user_shm_unlock(size_t size, struct user_struct *user)
spin_unlock(&shmlock_user_lock);
free_uid(user);
}
+
+#ifdef CONFIG_IPIPE
+int __ipipe_pin_vma(struct mm_struct *mm, struct vm_area_struct *vma)
+{
+ unsigned int gup_flags = 0;
+ int ret, len;
+
+ if (vma->vm_flags & (VM_IO | VM_PFNMAP))
+ return 0;
+
+ if (!((vma->vm_flags & VM_DONTEXPAND) ||
+ is_vm_hugetlb_page(vma) || vma == get_gate_vma(mm))) {
+ ret = populate_vma_page_range(vma, vma->vm_start, vma->vm_end,
+ NULL);
+ return ret < 0 ? ret : 0;
+ }
+
+ if ((vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE)
+ gup_flags |= FOLL_WRITE;
+ len = DIV_ROUND_UP(vma->vm_end, PAGE_SIZE) - vma->vm_start/PAGE_SIZE;
+ ret = get_user_pages_locked(vma->vm_start, len, gup_flags, NULL, NULL);
+ if (ret < 0)
+ return ret;
+ return ret == len ? 0 : -EFAULT;
+}
+#endif
diff --git a/mm/mmu_context.c b/mm/mmu_context.c
index a1da47e02747..ab150be0e88e 100644
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -9,6 +9,7 @@
#include <linux/sched/task.h>
#include <linux/mmu_context.h>
#include <linux/export.h>
+#include <linux/ipipe.h>
#include <asm/mmu_context.h>
@@ -23,10 +24,12 @@ void use_mm(struct mm_struct *mm)
{
struct mm_struct *active_mm;
struct task_struct *tsk = current;
+ unsigned long flags;
task_lock(tsk);
/* Hold off tlb flush IPIs while switching mm's */
local_irq_disable();
+ ipipe_mm_switch_protect(flags);
active_mm = tsk->active_mm;
if (active_mm != mm) {
mmgrab(mm);
@@ -34,6 +37,7 @@ void use_mm(struct mm_struct *mm)
}
tsk->mm = mm;
switch_mm_irqs_off(active_mm, mm, tsk);
+ ipipe_mm_switch_unprotect(flags);
local_irq_enable();
task_unlock(tsk);
#ifdef finish_arch_post_lock_switch
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 95dee88f782b..0cdc97eaead3 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -22,6 +22,7 @@
#include <linux/swap.h>
#include <linux/swapops.h>
#include <linux/mmu_notifier.h>
+#include <linux/ipipe.h>
#include <linux/migrate.h>
#include <linux/perf_event.h>
#include <linux/pkeys.h>
@@ -41,7 +42,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
{
pte_t *pte, oldpte;
spinlock_t *ptl;
- unsigned long pages = 0;
+ unsigned long pages = 0, flags;
int target_node = NUMA_NO_NODE;
/*
@@ -109,6 +110,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
continue;
}
+ flags = hard_local_irq_save();
oldpte = ptep_modify_prot_start(vma, addr, pte);
ptent = pte_modify(oldpte, newprot);
if (preserve_write)
@@ -121,6 +123,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
ptent = pte_mkwrite(ptent);
}
ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
+ hard_local_irq_restore(flags);
pages++;
} else if (IS_ENABLED(CONFIG_MIGRATION)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
@@ -338,6 +341,12 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
pages = hugetlb_change_protection(vma, start, end, newprot);
else
pages = change_protection_range(vma, start, end, newprot, dirty_accountable, prot_numa);
+#ifdef CONFIG_IPIPE
+ if (test_bit(MMF_VM_PINNED, &vma->vm_mm->flags) &&
+ ((vma->vm_flags | vma->vm_mm->def_flags) & VM_LOCKED) &&
+ (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
+ __ipipe_pin_vma(vma->vm_mm, vma);
+#endif
return pages;
}
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5797e1eeaa7e..80dff9b8d391 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -238,6 +238,8 @@ static int vmap_page_range_noflush(unsigned long start, unsigned long end,
return err;
} while (pgd++, addr = next, addr != end);
+ __ipipe_pin_mapping_globally(start, end);
+
return nr;
}