[so2] [tema3][linux] spinlock recursion

Daniel Baluta daniel.baluta at gmail.com
Wed Apr 11 09:29:16 EEST 2012


On Tue, Apr 10, 2012 at 10:27 PM, Daniel Baluta <daniel.baluta at gmail.com> wrote:
> On Tue, Apr 10, 2012 at 9:56 PM, Daniel Pletea <dani.pletea at gmail.com> wrote:
>>
>>
>> On Tue, Apr 10, 2012 at 9:48 PM, Daniel Baluta <daniel.baluta at gmail.com>
>> wrote:
>>>
>>> On Tue, Apr 10, 2012 at 9:39 PM, Daniel Pletea <dani.pletea at gmail.com>
>>> wrote:
>>> >
>>> >
>>> > On Tue, Apr 10, 2012 at 9:27 PM, Daniel Pletea <dani.pletea at gmail.com>
>>> > wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Apr 10, 2012 at 9:24 PM, Daniel Pletea <dani.pletea at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> 2012/4/10 Daniel Baluta <daniel.baluta at gmail.com>
>>> >>>>
>>> >>>> 2012/4/10 Daniel Pletea <dani.pletea at gmail.com>:
>>> >>>> > Salut,
>>> >>>> >
>>> >>>> > Am un bug: spinlock recursion; pe care nu reusesc sa il rezolv. Nu
>>> >>>> > imi
>>> >>>> > dau
>>> >>>> > seama de unde vine. Ce am facut pana acum a fost sa primesc
>>> >>>> > bio-urile
>>> >>>> > de la
>>> >>>> > nivel userspace. Am creat cate un bio pentru fiecare operatie de
>>> >>>> > citire si
>>> >>>> > i-am dat submit catre PHYSICAL_DISK1, pe sectorul 0 (adica am facut
>>> >>>> > un
>>> >>>> > bio
>>> >>>> > de citire dummy pe sectorul 0). Am alocat pagina, am pust event. Am
>>> >>>> > folosit
>>> >>>> > workqueues, iar in work_handler, dupa wait_for_completion am avut
>>> >>>> > grija sa
>>> >>>> > fac free la tot ce era nevoie. De asemenea nu folosesc nici un
>>> >>>> > spinlock.
>>> >>>> >
>>> >>>> > Am rulat ./test de mai multe ori. Uneori se termina bine, uneori se
>>> >>>> > ajunge
>>> >>>> > la acest spinlock recursion.
>>> >>>> >
>>> >>>> > Sa pun si call traceul ? E cam mare.
>>> >>>>
>>> >>>> Bună,
>>> >>>>
>>> >>>> Da pune și trace-ul.
>>> >>>>
>>> >>>> thanks,
>>> >>>> Daniel.
>>> >>>> _______________________________________________
>>> >>>> http://elf.cs.pub.ro/so2/wiki/resurse/lista-discutii
>>> >>>
>>> >>>
>>> >>>
>>> >>> Call Trace:
>>> >>>  [<c010a8f5>] ? show_regs+0x25/0x30
>>> >>>  [<c059d665>] nmi_watchdog_tick+0x1c5/0x1f0
>>> >>>  [<c059c6fb>] do_nmi+0x13b/0x290
>>> >>>  [<c059c170>] nmi_stack_correct+0x2f/0x34
>>> >>>  [<c01200e0>] ? adjust_for_32bit_ctr+0x20/0xb0
>>> >>>  [<c0123c65>] ? default_send_IPI_mask_logical+0x75/0xf0
>>> >>>  [<c0123a60>] default_send_IPI_all+0x90/0xa0
>>> >>>  [<c01240a7>] arch_trigger_all_cpu_backtrace+0x37/0x70
>>> >>>  [<c037a59e>] do_raw_spin_lock+0x12e/0x140
>>> >>>  [<c059ab56>] _raw_spin_lock_irqsave+0x76/0x90
>>> >>>  [<c036969a>] ? cfq_idle_slice_timer+0x2a/0xe0
>>> >>>  [<c036969a>] cfq_idle_slice_timer+0x2a/0xe0
>>> >>>  [<c015b611>] run_timer_softirq+0x161/0x370
>>> >>>  [<c015b589>] ? run_timer_softirq+0xd9/0x370
>>> >>>  [<c0369670>] ? cfq_idle_slice_timer+0x0/0xe0
>>> >>>  [<c0152c43>] __do_softirq+0xa3/0x1e0
>>> >>>  [<c017ddb8>] ? __lock_acquire+0x178/0x500
>>> >>>  [<c017ddb8>] ? __lock_acquire+0x178/0x500
>>> >>>  [<c0152dc5>] do_softirq+0x45/0x50
>>> >>>  [<c01531a5>] irq_exit+0x85/0x90
>>> >>>  [<c05a191a>] smp_apic_timer_interrupt+0x5a/0x89
>>> >>>  [<c059be41>] apic_timer_interrupt+0x31/0x38
>>> >>>  [<c02100d8>] ? sys_swapoff+0x1e8/0x970
>>> >>>  [<c0219c36>] ? __slab_alloc+0x116/0x600
>>> >>>  [<c0165a16>] ? __queue_work+0x36/0x50
>>> >>>  [<c021a291>] kmem_cache_alloc_notrace+0xb1/0xc0
>>> >>>  [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> >>>  [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> >>>  [<d082319f>] send_test_bio+0x1f/0x180 [ssr]
>>> >>>  [<c013394f>] ? kmap_atomic_prot+0x4f/0xf0
>>> >>>  [<c059eb89>] ? sub_preempt_count+0x9/0xb0
>>> >>>  [<d08233ca>] my_block_request+0xca/0x198 [ssr]
>>> >>>  [<c0357fde>] __generic_unplug_device+0x2e/0x40
>>> >>>  [<c0358016>] generic_unplug_device+0x26/0x40
>>> >>>  [<c035904c>] blk_unplug+0x2c/0x70
>>> >>>  [<c03590a0>] blk_backing_dev_unplug+0x10/0x20
>>> >>>  [<c024e79f>] block_sync_page+0x3f/0x50
>>> >>>  [<c01e6c67>] sync_page+0x37/0x50
>>> >>>  [<c01e6c8d>] sync_page_killable+0xd/0x40
>>> >>>  [<c0599117>] __wait_on_bit_lock+0x47/0x90
>>> >>>  [<c013b69b>] ? get_parent_ip+0xb/0x40
>>> >>>  [<c01e6c80>] ? sync_page_killable+0x0/0x40
>>> >>>  [<c01e6b6b>] __lock_page_killable+0x7b/0x90
>>> >>>  [<c0169b70>] ? wake_bit_function+0x0/0x50
>>> >>>  [<c01e862e>] generic_file_aio_read+0x47e/0x730
>>> >>>  [<c0228af4>] do_sync_read+0xa4/0xe0
>>> >>>  [<c013b69b>] ? get_parent_ip+0xb/0x40
>>> >>>  [<c0348136>] ? apparmor_file_permission+0x16/0x20
>>> >>>  [<c031a6a4>] ? security_file_permission+0x14/0x20
>>> >>>  [<c0228b94>] ? rw_verify_area+0x64/0xe0
>>> >>>  [<c059954d>] ? mutex_unlock+0xd/0x10
>>> >>>  [<c022956f>] vfs_read+0x9f/0x1a0
>>> >>>  [<c0253d00>] ? block_llseek+0x0/0xf0
>>> >>>  [<c0228a50>] ? do_sync_read+0x0/0xe0
>>> >>>  [<c02296b2>] sys_read+0x42/0x70
>>> >>>  [<c059b998>] syscall_call+0x7/0xb
>>> >>>  [<c0590000>] ? cpu_detect+0x6f/0xc3
>>> >>>
>>> >>
>>> >>
>>> >> Nu e acesta trace-ul. Revin imediat cu trace-ul "bun".
>>> >>
>>> >> Sorry
>>> >>
>>> >
>>> > Acesta este call trace-ul "bun":
>>> >
>>> > "[  235.962093] BUG: spinlock recursion on CPU#0, test/2291
>>> > [  235.962595]  lock: d08238e0, .magic: dead4ead, .owner: test/2291,
>>> > .owner_cpu:
>>> >  0
>>> > [  235.963238] Pid: 2291, comm: test Not tainted 2.6.35.10-so-so2 #2
>>> > [  235.963702] Call Trace:
>>> > [  235.964106]  [<c0597fee>] ? printk+0x2d/0x2f
>>> > [  235.964581]  [<c037a383>] spin_bug+0xa3/0xf0
>>> > [  235.966913]  [<c037a55a>] do_raw_spin_lock+0xea/0x140
>>> > [  235.967426]  [<c017bab1>] ? put_lock_stats+0x21/0x30
>>> > [  235.967919]  [<c036969a>] ? cfq_idle_slice_timer+0x2a/0xe0
>>> > [  235.968550]  [<c035800f>] ? generic_unplug_device+0x1f/0x40
>>> > [  235.969068]  [<c059ab56>] _raw_spin_lock_irqsave+0x76/0x90
>>> > [  235.969570]  [<c036969a>] ? cfq_idle_slice_timer+0x2a/0xe0
>>> > [  235.970023]  [<c036969a>] cfq_idle_slice_timer+0x2a/0xe0
>>> > [  235.970448]  [<c015b611>] run_timer_softirq+0x161/0x370
>>> > [  235.970913]  [<c015b589>] ? run_timer_softirq+0xd9/0x370
>>> > [  235.971731]  [<c0369670>] ? cfq_idle_slice_timer+0x0/0xe0
>>> > [  235.972313]  [<c0152c43>] __do_softirq+0xa3/0x1e0
>>> > [  235.972815]  [<c017ddb8>] ? __lock_acquire+0x178/0x500
>>> > [  235.973289]  [<c017ddb8>] ? __lock_acquire+0x178/0x500
>>> > [  235.973807]  [<c0152dc5>] do_softirq+0x45/0x50
>>> > [  235.974270]  [<c01531a5>] irq_exit+0x85/0x90
>>> > [  235.974696]  [<c05a191a>] smp_apic_timer_interrupt+0x5a/0x89
>>> > [  235.975202]  [<c059be41>] apic_timer_interrupt+0x31/0x38
>>> > [  235.975741]  [<c02100d8>] ? sys_swapoff+0x1e8/0x970
>>> > [  235.976197]  [<c0219c36>] ? __slab_alloc+0x116/0x600
>>> > [  235.976651]  [<c0165a16>] ? __queue_work+0x36/0x50
>>> > [  235.977220]  [<c021a291>] kmem_cache_alloc_notrace+0xb1/0xc0
>>> > [  235.977935]  [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> > [  235.978552]  [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> > [  235.979263]  [<d082319f>] send_test_bio+0x1f/0x180 [ssr]
>>> > [  235.981221]  [<c013394f>] ? kmap_atomic_prot+0x4f/0xf0
>>> > [  235.982220]  [<c059eb89>] ? sub_preempt_count+0x9/0xb0
>>> > [  235.982687]  [<d08233ca>] my_block_request+0xca/0x198 [ssr]
>>> > [  235.983151]  [<c0357fde>] __generic_unplug_device+0x2e/0x40
>>> > [  235.983604]  [<c0358016>] generic_unplug_device+0x26/0x40
>>> > [  235.984051]  [<c035904c>] blk_unplug+0x2c/0x70
>>> > [  235.984474]  [<c03590a0>] blk_backing_dev_unplug+0x10/0x20
>>> > [  235.984900]  [<c024e79f>] block_sync_page+0x3f/0x50
>>> > [  235.985392]  [<c01e6c67>] sync_page+0x37/0x50
>>> > [  235.985815]  [<c01e6c8d>] sync_page_killable+0xd/0x40
>>> > [  235.986311]  [<c0599117>] __wait_on_bit_lock+0x47/0x90
>>> > [  235.986783]  [<c013b69b>] ? get_parent_ip+0xb/0x40
>>> > [  235.987382]  [<c01e6c80>] ? sync_page_killable+0x0/0x40
>>> > [  235.987794]  [<c01e6b6b>] __lock_page_killable+0x7b/0x90
>>> > [  235.988268]  [<c0169b70>] ? wake_bit_function+0x0/0x50
>>> > [  235.988772]  [<c01e862e>] generic_file_aio_read+0x47e/0x730
>>> > [  235.989258]  [<c0228af4>] do_sync_read+0xa4/0xe0
>>> > [  235.990241]  [<c013b69b>] ? get_parent_ip+0xb/0x40
>>> > [  235.990712]  [<c0348136>] ? apparmor_file_permission+0x16/0x20
>>> > [  235.991190]  [<c031a6a4>] ? security_file_permission+0x14/0x20
>>> > [  235.991820]  [<c0228b94>] ? rw_verify_area+0x64/0xe0
>>> > [  235.992317]  [<c059954d>] ? mutex_unlock+0xd/0x10
>>> > [  235.992753]  [<c022956f>] vfs_read+0x9f/0x1a0
>>> > [  235.993232]  [<c0253d00>] ? block_llseek+0x0/0xf0
>>> > [  235.995736]  [<c0228a50>] ? do_sync_read+0x0/0xe0
>>> > [  235.996276]  [<c02296b2>] sys_read+0x42/0x70
>>> > [  235.996863]  [<c059b998>] syscall_call+0x7/0xb
>>> > [  235.997324]  [<c0590000>] ? cpu_detect+0x6f/0xc3
>>>
>>> Uploadează te rog tema pe vmchecker :), să pot să
>>> mă uit unpic la cod.
>>
>>
>> Am uploadat.
>
> OK. Tu apelezi: __bio_kmap_atomic, și
> conform comentariului de aici [1], în
> timp ce ții o referința la o pagină obținută
> cu funcția asta nu ai voie să dormi.
>
> La tine să întâmplă să dormi astfel:
>
>>> > [  235.977220]  [<c021a291>] kmem_cache_alloc_notrace+0xb1/0xc0
>>> > [  235.977935]  [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> > [  235.978552]  [<d082319f>] ? send_test_bio+0x1f/0x180 [ssr]
>>> > [  235.979263]  [<d082319f>] send_test_bio+0x1f/0x180 [ssr]
>>> > [  235.981221]  [<c013394f>] ? kmap_atomic_prot+0x4f/0xf0
>>> > [  235.982220]  [<c059eb89>] ? sub_preempt_count+0x9/0xb0
>>> > [  235.982687]  [<d08233ca>] my_block_request+0xca/0x198 [ssr]
>
> Ai două posibilități:
> 1. fie folosești GFP_ATOMIC când aloci memorie în send_test_bio.
> 2. fie îți aloci dinainte un tmp_buf, faci buf = __bio_kmap_atomic, copiezi,
> copiezi cu memcpy buf în tmp_buf și faci __bio_kunmap_atomic, după
> care folosești tmp_buf pentru restul operațiilor.
>
> Recomand să folosești 2.
>
> thanks,
> Daniel.
> [1] http://lxr.linux.no/linux+v3.3.1/arch/x86/mm/highmem_32.c#L24

Salut,

Să ne zici dacă s-a rezolvat.

thanks,
Daniel.


More information about the so2 mailing list