[so2] [tema3][linux] spinlock recursion
Daniel Baluta
daniel.baluta at gmail.com
Wed Apr 11 09:29:16 EEST 2012
On Tue, Apr 10, 2012 at 10:27 PM, Daniel Baluta <daniel.baluta at gmail.com> wrote:
> On Tue, Apr 10, 2012 at 9:56 PM, Daniel Pletea <dani.pletea at gmail.com> wrote:
>>
>>
>> On Tue, Apr 10, 2012 at 9:48 PM, Daniel Baluta <daniel.baluta at gmail.com>
>> wrote:
>>>
>>> On Tue, Apr 10, 2012 at 9:39 PM, Daniel Pletea <dani.pletea at gmail.com>
>>> wrote:
>>> >
>>> >
>>> > On Tue, Apr 10, 2012 at 9:27 PM, Daniel Pletea <dani.pletea at gmail.com>
>>> > wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Apr 10, 2012 at 9:24 PM, Daniel Pletea <dani.pletea at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> 2012/4/10 Daniel Baluta <daniel.baluta at gmail.com>
>>> >>>>
>>> >>>> 2012/4/10 Daniel Pletea <dani.pletea at gmail.com>:
>>> >>>> > Salut,
>>> >>>> >
>>> >>>> > Am un bug: spinlock recursion; pe care nu reusesc sa il rezolv. Nu
>>> >>>> > imi
>>> >>>> > dau
>>> >>>> > seama de unde vine. Ce am facut pana acum a fost sa primesc
>>> >>>> > bio-urile
>>> >>>> > de la
>>> >>>> > nivel userspace. Am creat cate un bio pentru fiecare operatie de
>>> >>>> > citire si
>>> >>>> > i-am dat submit catre PHYSICAL_DISK1, pe sectorul 0 (adica am facut
>>> >>>> > un
>>> >>>> > bio
>>> >>>> > de citire dummy pe sectorul 0). Am alocat pagina, am pust event. Am
>>> >>>> > folosit
>>> >>>> > workqueues, iar in work_handler, dupa wait_for_completion am avut
>>> >>>> > grija sa
>>> >>>> > fac free la tot ce era nevoie. De asemenea nu folosesc nici un
>>> >>>> > spinlock.
>>> >>>> >
>>> >>>> > Am rulat ./test de mai multe ori. Uneori se termina bine, uneori se
>>> >>>> > ajunge
>>> >>>> > la acest spinlock recursion.
>>> >>>> >
>>> >>>> > Sa pun si call traceul ? E cam mare.
>>> >>>>
>>> >>>> Bună,
>>> >>>>
>>> >>>> Da pune și trace-ul.
>>> >>>>
>>> >>>> thanks,
>>> >>>> Daniel.
>>> >>>> _______________________________________________
>>> >>>> http://elf.cs.pub.ro/so2/wiki/resurse/lista-discutii
>>> >>>
>>> >>>
>>> >>>
>>> >>> Call Trace:
>>> >>> [<c010a8f5>] ? show_regs+0x25/0x30
>>> >>> [<c059d665>] nmi_watchdog_tick+0x1c5/0x1f0
>>> >>> [<c059c6fb>] do_nmi+0x13b/0x290
>>> >>> [<c059c170>] nmi_stack_correct+0x2f/0x34
>>> >>> [<c01200e0>] ? adjust_for_32bit_ctr+0x20/0xb0
>>> >>> [<c0123c65>] ? default_send_IPI_mask_logical+0x75/0xf0
>>> >>> [<c0123a60>] default_send_IPI_all+0x90/0xa0
>>> >>> [<c01240a7>] arch_trigger_all_cpu_backtrace+0x37/0x70
>>> >>> [<c037a59e>] do_raw_spin_lock+0x12e/0x140
>>> >>> [<c059ab56>] _raw_spin_lock_irqsave+0x76/0x90
>>> >>> [<c036969a>] ? cfq_idle_slice_timer+0x2a/0xe0
>>> >>> [<c036969a>] cfq_idle_slice_timer+0x2a/0xe0
>>> >>> [<c015b611>] run_timer_softirq+0x161/0x370
>>> >>> [<c015b589>] ? run_timer_softirq+0xd9/0x370
>>> >>> [<c0369670>] ? cfq_idle_slice_timer+0x0/0xe0
>>> >>> [<c0152c43>] __do_softirq+0xa3/0x1e0
>>> >>> [<c017ddb8>] ? __lock_acquire+0x178/0x500
>>> >>> [<c017ddb8>] ? __lock_acquire+0x178/0x500
>>> >>> [<c0152dc5>] do_softirq+0x45/0x50
>>> >>> [<c01531a5>] irq_exit+0x85/0x90
>>> >>> [<c05a191a>] smp_apic_timer_interrupt+0x5a/0x89
>>> >>> [<c059be41>] apic_timer_interrupt+0x31/0x38
>>> >>> [<c02100d8>] ? sys_swapoff+0x1e8/0x970
>>> >>> [<c0219c36>] ? __slab_alloc+0x116/0x600
>>> >>> [<c0165a16>] ? __queue_work+0x36/0x50
>>> >>> [<c021a291>] kmem_cache_alloc_notrace+0xb1/0xc0
>>> >>> [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> >>> [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> >>> [<d082319f>] send_test_bio+0x1f/0x180 [ssr]
>>> >>> [<c013394f>] ? kmap_atomic_prot+0x4f/0xf0
>>> >>> [<c059eb89>] ? sub_preempt_count+0x9/0xb0
>>> >>> [<d08233ca>] my_block_request+0xca/0x198 [ssr]
>>> >>> [<c0357fde>] __generic_unplug_device+0x2e/0x40
>>> >>> [<c0358016>] generic_unplug_device+0x26/0x40
>>> >>> [<c035904c>] blk_unplug+0x2c/0x70
>>> >>> [<c03590a0>] blk_backing_dev_unplug+0x10/0x20
>>> >>> [<c024e79f>] block_sync_page+0x3f/0x50
>>> >>> [<c01e6c67>] sync_page+0x37/0x50
>>> >>> [<c01e6c8d>] sync_page_killable+0xd/0x40
>>> >>> [<c0599117>] __wait_on_bit_lock+0x47/0x90
>>> >>> [<c013b69b>] ? get_parent_ip+0xb/0x40
>>> >>> [<c01e6c80>] ? sync_page_killable+0x0/0x40
>>> >>> [<c01e6b6b>] __lock_page_killable+0x7b/0x90
>>> >>> [<c0169b70>] ? wake_bit_function+0x0/0x50
>>> >>> [<c01e862e>] generic_file_aio_read+0x47e/0x730
>>> >>> [<c0228af4>] do_sync_read+0xa4/0xe0
>>> >>> [<c013b69b>] ? get_parent_ip+0xb/0x40
>>> >>> [<c0348136>] ? apparmor_file_permission+0x16/0x20
>>> >>> [<c031a6a4>] ? security_file_permission+0x14/0x20
>>> >>> [<c0228b94>] ? rw_verify_area+0x64/0xe0
>>> >>> [<c059954d>] ? mutex_unlock+0xd/0x10
>>> >>> [<c022956f>] vfs_read+0x9f/0x1a0
>>> >>> [<c0253d00>] ? block_llseek+0x0/0xf0
>>> >>> [<c0228a50>] ? do_sync_read+0x0/0xe0
>>> >>> [<c02296b2>] sys_read+0x42/0x70
>>> >>> [<c059b998>] syscall_call+0x7/0xb
>>> >>> [<c0590000>] ? cpu_detect+0x6f/0xc3
>>> >>>
>>> >>
>>> >>
>>> >> Nu e acesta trace-ul. Revin imediat cu trace-ul "bun".
>>> >>
>>> >> Sorry
>>> >>
>>> >
>>> > Acesta este call trace-ul "bun":
>>> >
>>> > "[ 235.962093] BUG: spinlock recursion on CPU#0, test/2291
>>> > [ 235.962595] lock: d08238e0, .magic: dead4ead, .owner: test/2291,
>>> > .owner_cpu:
>>> > 0
>>> > [ 235.963238] Pid: 2291, comm: test Not tainted 2.6.35.10-so-so2 #2
>>> > [ 235.963702] Call Trace:
>>> > [ 235.964106] [<c0597fee>] ? printk+0x2d/0x2f
>>> > [ 235.964581] [<c037a383>] spin_bug+0xa3/0xf0
>>> > [ 235.966913] [<c037a55a>] do_raw_spin_lock+0xea/0x140
>>> > [ 235.967426] [<c017bab1>] ? put_lock_stats+0x21/0x30
>>> > [ 235.967919] [<c036969a>] ? cfq_idle_slice_timer+0x2a/0xe0
>>> > [ 235.968550] [<c035800f>] ? generic_unplug_device+0x1f/0x40
>>> > [ 235.969068] [<c059ab56>] _raw_spin_lock_irqsave+0x76/0x90
>>> > [ 235.969570] [<c036969a>] ? cfq_idle_slice_timer+0x2a/0xe0
>>> > [ 235.970023] [<c036969a>] cfq_idle_slice_timer+0x2a/0xe0
>>> > [ 235.970448] [<c015b611>] run_timer_softirq+0x161/0x370
>>> > [ 235.970913] [<c015b589>] ? run_timer_softirq+0xd9/0x370
>>> > [ 235.971731] [<c0369670>] ? cfq_idle_slice_timer+0x0/0xe0
>>> > [ 235.972313] [<c0152c43>] __do_softirq+0xa3/0x1e0
>>> > [ 235.972815] [<c017ddb8>] ? __lock_acquire+0x178/0x500
>>> > [ 235.973289] [<c017ddb8>] ? __lock_acquire+0x178/0x500
>>> > [ 235.973807] [<c0152dc5>] do_softirq+0x45/0x50
>>> > [ 235.974270] [<c01531a5>] irq_exit+0x85/0x90
>>> > [ 235.974696] [<c05a191a>] smp_apic_timer_interrupt+0x5a/0x89
>>> > [ 235.975202] [<c059be41>] apic_timer_interrupt+0x31/0x38
>>> > [ 235.975741] [<c02100d8>] ? sys_swapoff+0x1e8/0x970
>>> > [ 235.976197] [<c0219c36>] ? __slab_alloc+0x116/0x600
>>> > [ 235.976651] [<c0165a16>] ? __queue_work+0x36/0x50
>>> > [ 235.977220] [<c021a291>] kmem_cache_alloc_notrace+0xb1/0xc0
>>> > [ 235.977935] [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> > [ 235.978552] [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> > [ 235.979263] [<d082319f>] send_test_bio+0x1f/0x180 [ssr]
>>> > [ 235.981221] [<c013394f>] ? kmap_atomic_prot+0x4f/0xf0
>>> > [ 235.982220] [<c059eb89>] ? sub_preempt_count+0x9/0xb0
>>> > [ 235.982687] [<d08233ca>] my_block_request+0xca/0x198 [ssr]
>>> > [ 235.983151] [<c0357fde>] __generic_unplug_device+0x2e/0x40
>>> > [ 235.983604] [<c0358016>] generic_unplug_device+0x26/0x40
>>> > [ 235.984051] [<c035904c>] blk_unplug+0x2c/0x70
>>> > [ 235.984474] [<c03590a0>] blk_backing_dev_unplug+0x10/0x20
>>> > [ 235.984900] [<c024e79f>] block_sync_page+0x3f/0x50
>>> > [ 235.985392] [<c01e6c67>] sync_page+0x37/0x50
>>> > [ 235.985815] [<c01e6c8d>] sync_page_killable+0xd/0x40
>>> > [ 235.986311] [<c0599117>] __wait_on_bit_lock+0x47/0x90
>>> > [ 235.986783] [<c013b69b>] ? get_parent_ip+0xb/0x40
>>> > [ 235.987382] [<c01e6c80>] ? sync_page_killable+0x0/0x40
>>> > [ 235.987794] [<c01e6b6b>] __lock_page_killable+0x7b/0x90
>>> > [ 235.988268] [<c0169b70>] ? wake_bit_function+0x0/0x50
>>> > [ 235.988772] [<c01e862e>] generic_file_aio_read+0x47e/0x730
>>> > [ 235.989258] [<c0228af4>] do_sync_read+0xa4/0xe0
>>> > [ 235.990241] [<c013b69b>] ? get_parent_ip+0xb/0x40
>>> > [ 235.990712] [<c0348136>] ? apparmor_file_permission+0x16/0x20
>>> > [ 235.991190] [<c031a6a4>] ? security_file_permission+0x14/0x20
>>> > [ 235.991820] [<c0228b94>] ? rw_verify_area+0x64/0xe0
>>> > [ 235.992317] [<c059954d>] ? mutex_unlock+0xd/0x10
>>> > [ 235.992753] [<c022956f>] vfs_read+0x9f/0x1a0
>>> > [ 235.993232] [<c0253d00>] ? block_llseek+0x0/0xf0
>>> > [ 235.995736] [<c0228a50>] ? do_sync_read+0x0/0xe0
>>> > [ 235.996276] [<c02296b2>] sys_read+0x42/0x70
>>> > [ 235.996863] [<c059b998>] syscall_call+0x7/0xb
>>> > [ 235.997324] [<c0590000>] ? cpu_detect+0x6f/0xc3
>>>
>>> Uploadează te rog tema pe vmchecker :), să pot să
>>> mă uit unpic la cod.
>>
>>
>> Am uploadat.
>
> OK. Tu apelezi: __bio_kmap_atomic, și
> conform comentariului de aici [1], în
> timp ce ții o referința la o pagină obținută
> cu funcția asta nu ai voie să dormi.
>
> La tine să întâmplă să dormi astfel:
>
>>> > [ 235.977220] [<c021a291>] kmem_cache_alloc_notrace+0xb1/0xc0
>>> > [ 235.977935] [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> > [ 235.978552] [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> > [ 235.979263] [<d082319f>] send_test_bio+0x1f/0x180 [ssr]
>>> > [ 235.981221] [<c013394f>] ? kmap_atomic_prot+0x4f/0xf0
>>> > [ 235.982220] [<c059eb89>] ? sub_preempt_count+0x9/0xb0
>>> > [ 235.982687] [<d08233ca>] my_block_request+0xca/0x198 [ssr]
>
> Ai două posibilități:
> 1. fie folosești GFP_ATOMIC când aloci memorie în send_test_bio.
> 2. fie îți aloci dinainte un tmp_buf, faci buf = __bio_kmap_atomic, copiezi,
> copiezi cu memcpy buf în tmp_buf și faci __bio_kunmap_atomic, după
> care folosești tmp_buf pentru restul operațiilor.
>
> Recomand să folosești 2.
>
> thanks,
> Daniel.
> [1] http://lxr.linux.no/linux+v3.3.1/arch/x86/mm/highmem_32.c#L24
Salut,
Să ne zici dacă s-a rezolvat.
thanks,
Daniel.
More information about the so2
mailing list